Trending repositories for topic pdf
Document (PDF) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSON or Markdown
#1 Locally hosted web application that allows you to perform various operations on PDF files
A community-supported supercharged version of paperless: scan, index and archive all your physical documents
A privacy-first, self-hosted, fully open source personal knowledge management software, written in typescript and golang.
A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。
🖼️ Image Toolbox is a powerful app for advanced image manipulation. It offers dozens of features, from basic tools like crop and draw to filters, OCR, and a wide range of image processing options
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
A modern ebook manager and reader with sync and backup capacities for Windows, macOS, Linux and Web
A developer-friendly API for converting numerous document formats into PDF files, and more!
📚 Biblioteca de livros essenciais da área da programação. (Confira o meu novo projeto `SendScriptWhatsapp`)
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
🔥🔥超过1000本的计算机经典书籍、个人笔记资料以及本人在各平台发表文章中所涉及的资源等。书籍资源包括C/C++、Java、Python、Go语言、数据结构与算法、操作系统、后端架构、计算机系统知识、数据库、计算机网络、设计模式、前端、汇编以及校招社招各种面经~
QuestPDF is a modern open-source .NET library for PDF document generation. Offering comprehensive layout engine powered by concise and discoverable C# Fluent API. Easily generate PDF reports, invoices...
An ebook reader application supporting PDF, DjVu, EPUB, FB2 and many more formats, running on Cervantes, Kindle, Kobo, PocketBook and Android devices
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
Document (PDF) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSON or Markdown
Completely local RAG. Chat with your PDF documents (with open LLM) and UI to that uses LangChain, Streamlit, Ollama (Llama 3.1), Qdrant and advanced methods like reranking and semantic chunking.
XSL transformators for web and pdf rendering of German CIUS XRechnung or EN16931-1:2017 [MIRROR OF GitLab]
File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.
A viewer based on PDFjs, which can be embedded in any web page (not using iframes)
PDF解析(文字,章节,表格,图片,参考),基于大模型(ChatGLM2-6B, RWKV)+langchain+streamlit的PDF问答,摘要,信息抽取
HTMLToQPDF is an extension for QuestPDF that allows to generate PDF from HTML
🖼️ Image Toolbox is a powerful app for advanced image manipulation. It offers dozens of features, from basic tools like crop and draw to filters, OCR, and a wide range of image processing options
Download PDF books from bSmart, Pearson, Oxford, and many more!
#1 Locally hosted web application that allows you to perform various operations on PDF files
DearFlip - 3D FlipBook JS/jQuery Plugin. Create 3D Flipbook or PDF Flipbook using JavaScript / jQuery
📚 非常棒的程序员学习书籍大全。(📚 Great programmer learning Book Encyclopedia.)
Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic...
A python wrapper for the Doc2X API and comes with native texts processing (to improve PDF recall in RAG). | Doc2X API的python封装,同时附带本地的文本处理(提升PDF在RAG中的召回率)。
Document (PDF) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSON or Markdown
#1 Locally hosted web application that allows you to perform various operations on PDF files
A community-supported supercharged version of paperless: scan, index and archive all your physical documents
A privacy-first, self-hosted, fully open source personal knowledge management software, written in typescript and golang.
A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。
🖼️ Image Toolbox is a powerful app for advanced image manipulation. It offers dozens of features, from basic tools like crop and draw to filters, OCR, and a wide range of image processing options
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
A modern ebook manager and reader with sync and backup capacities for Windows, macOS, Linux and Web
A developer-friendly API for converting numerous document formats into PDF files, and more!
📚 Biblioteca de livros essenciais da área da programação. (Confira o meu novo projeto `SendScriptWhatsapp`)
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
🔥🔥超过1000本的计算机经典书籍、个人笔记资料以及本人在各平台发表文章中所涉及的资源等。书籍资源包括C/C++、Java、Python、Go语言、数据结构与算法、操作系统、后端架构、计算机系统知识、数据库、计算机网络、设计模式、前端、汇编以及校招社招各种面经~
QuestPDF is a modern open-source .NET library for PDF document generation. Offering comprehensive layout engine powered by concise and discoverable C# Fluent API. Easily generate PDF reports, invoices...
An ebook reader application supporting PDF, DjVu, EPUB, FB2 and many more formats, running on Cervantes, Kindle, Kobo, PocketBook and Android devices
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
Document (PDF) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSON or Markdown
Completely local RAG. Chat with your PDF documents (with open LLM) and UI to that uses LangChain, Streamlit, Ollama (Llama 3.1), Qdrant and advanced methods like reranking and semantic chunking.
Learn how to create HTML/ZIP/PNG polyglot files in JavaScript
XSL transformators for web and pdf rendering of German CIUS XRechnung or EN16931-1:2017 [MIRROR OF GitLab]
File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.
A viewer based on PDFjs, which can be embedded in any web page (not using iframes)
PDF解析(文字,章节,表格,图片,参考),基于大模型(ChatGLM2-6B, RWKV)+langchain+streamlit的PDF问答,摘要,信息抽取
HTMLToQPDF is an extension for QuestPDF that allows to generate PDF from HTML
🖼️ Image Toolbox is a powerful app for advanced image manipulation. It offers dozens of features, from basic tools like crop and draw to filters, OCR, and a wide range of image processing options
Download PDF books from bSmart, Pearson, Oxford, and many more!
#1 Locally hosted web application that allows you to perform various operations on PDF files
DearFlip - 3D FlipBook JS/jQuery Plugin. Create 3D Flipbook or PDF Flipbook using JavaScript / jQuery
📚 非常棒的程序员学习书籍大全。(📚 Great programmer learning Book Encyclopedia.)
Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic...
Document (PDF) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSON or Markdown
A privacy-first, self-hosted, fully open source personal knowledge management software, written in typescript and golang.
Papermark is the open-source DocSend alternative with built-in analytics and custom domains.
#1 Locally hosted web application that allows you to perform various operations on PDF files
A community-supported supercharged version of paperless: scan, index and archive all your physical documents
A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。
Document (PDF) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSON or Markdown
经济学人(含音频)、纽约客、卫报、连线、大西洋月刊等英语杂志免费下载,支持epub、mobi、pdf格式, 每周更新
🔥🔥超过1000本的计算机经典书籍、个人笔记资料以及本人在各平台发表文章中所涉及的资源等。书籍资源包括C/C++、Java、Python、Go语言、数据结构与算法、操作系统、后端架构、计算机系统知识、数据库、计算机网络、设计模式、前端、汇编以及校招社招各种面经~
A modern ebook manager and reader with sync and backup capacities for Windows, macOS, Linux and Web
📚 Biblioteca de livros essenciais da área da programação. (Confira o meu novo projeto `SendScriptWhatsapp`)
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
🖼️ Image Toolbox is a powerful app for advanced image manipulation. It offers dozens of features, from basic tools like crop and draw to filters, OCR, and a wide range of image processing options
A maroto way to create PDFs. Maroto is inspired in Bootstrap and uses gofpdf. Fast and simple.
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
An ebook reader application supporting PDF, DjVu, EPUB, FB2 and many more formats, running on Cervantes, Kindle, Kobo, PocketBook and Android devices
支持word(.docx)、excel(.xlsx,.xls)、pdf、pptx等各类型office文件预览的vue组件集合,提供一站式office文件预览方案,支持vue2和3,也支持React等非Vue框架。Web-based pdf, excel, word, pptx preview library
Document (PDF) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSON or Markdown
PDF scientific paper translation and bilingual comparison - 完整保留排版的 PDF 文档全文双语翻译,支持 Google/Ollama 翻译
Papermark is the open-source DocSend alternative with built-in analytics and custom domains.
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
A Symfony Bundle for interacting with Gotenberg. Integrates natively with twig, router, PHPStorm and more !
Advanced book classes and packages for the SILE typesetting system: A path for making books from front cover to back cover.
Creating Pdf output from Html with .NET on Windows using the WebView2 control
Completely local RAG. Chat with your PDF documents (with open LLM) and UI to that uses LangChain, Streamlit, Ollama (Llama 3.1), Qdrant and advanced methods like reranking and semantic chunking.
File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.
Material3 eBook reader - Book's Story. Built with Jetpack Compose. Free & Open Source & Ad Free. 7 supported file formats (.txt, .pdf, .epub, .fb2, .zip, .html, .htm). Lots of customization.
A tool to download purchased e-books from different publishers
Python package to make documents look like they were scanned
A privacy-first, self-hosted, fully open source personal knowledge management software, written in typescript and golang.
A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。
Build and generate PDF using React 📄 UI kit for PDFs and print documents. Simple, reusable components and templates to create great invoices, docs, brochures. Use your favorite front-end framework Re...
Document (PDF) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSON or Markdown
The most Obsidian-native PDF annotation, viewing & editing tool ever. Comes with optional Vim keybindings.
GUI analyzer for deep-diving into PDF files. Detect malicious payloads, understand object relationships, and extract key information for threat analysis.
File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.
Material3 eBook reader - Book's Story. Built with Jetpack Compose. Free & Open Source & Ad Free. 7 supported file formats (.txt, .pdf, .epub, .fb2, .zip, .html, .htm). Lots of customization.
ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.
A python wrapper for the Doc2X API and comes with native texts processing (to improve PDF recall in RAG). | Doc2X API的python封装,同时附带本地的文本处理(提升PDF在RAG中的召回率)。
hotpdf is a fast PDF parsing library to extract text and find text within PDF documents built on top of pdfminer.six
Discover and converse with advanced AI models like Mistral, LLAMA2, and GPT-3.5 from leading sources like OLLAMA, Hugging Face, and OpenAI. Easily extract insights from PDFs, web pages, and YouTube vi...
#1 Locally hosted web application that allows you to perform various operations on PDF files
A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。
A community-supported supercharged version of paperless: scan, index and archive all your physical documents
A privacy-first, self-hosted, fully open source personal knowledge management software, written in typescript and golang.
经济学人(含音频)、纽约客、卫报、连线、大西洋月刊等英语杂志免费下载,支持epub、mobi、pdf格式, 每周更新
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
📚 Biblioteca de livros essenciais da área da programação. (Confira o meu novo projeto `SendScriptWhatsapp`)
A modern ebook manager and reader with sync and backup capacities for Windows, macOS, Linux and Web
🔥🔥超过1000本的计算机经典书籍、个人笔记资料以及本人在各平台发表文章中所涉及的资源等。书籍资源包括C/C++、Java、Python、Go语言、数据结构与算法、操作系统、后端架构、计算机系统知识、数据库、计算机网络、设计模式、前端、汇编以及校招社招各种面经~
Papermark is the open-source DocSend alternative with built-in analytics and custom domains.
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
QuestPDF is a modern open-source .NET library for PDF document generation. Offering comprehensive layout engine powered by concise and discoverable C# Fluent API. Easily generate PDF reports, invoices...
🖼️ Image Toolbox is a powerful app for advanced image manipulation. It offers dozens of features, from basic tools like crop and draw to filters, OCR, and a wide range of image processing options
Open source DocuSign alternative. Create, fill, and sign digital documents ✍️
Document (PDF) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSON or Markdown
GUI analyzer for deep-diving into PDF files. Detect malicious payloads, understand object relationships, and extract key information for threat analysis.
The most Obsidian-native PDF annotation, viewing & editing tool ever. Comes with optional Vim keybindings.
File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.
Specify a github or local repo, github pull request, arXiv or Sci-Hub paper, Youtube transcript or documentation URL on the web and scrape into a text file and clipboard for easier LLM ingestion
📚 Biblioteca de livros essenciais da área da programação. (Confira o meu novo projeto `SendScriptWhatsapp`)
View and Interact with PDFs in React, SolidJS, Svelte and JavaScript apps
SemanticPDF: Drag, Drop, Semantic Search - SemanticPDF is a simple, privacy-focused application that makes it easy to upload a PDF file and perform a semantic search on contents.
pdf-frame is a web framework designed specifically for handling PDF and Canvas graphics requirements. It provides component support for popular frameworks like Vue, Nuxt and React. With its declarativ...
#1 Locally hosted web application that allows you to perform various operations on PDF files
PDF解析(文字,章节,表格,图片,参考),基于大模型(ChatGLM2-6B, RWKV)+langchain+streamlit的PDF问答,摘要,信息抽取