15 results found Sort:
- Filter by Primary Language:
- Python (7)
- HTML (2)
- C++ (1)
- TypeScript (1)
- Visual Basic .NET (1)
- Jupyter Notebook (1)
- CSS (1)
- +
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
Created
2023-12-12
2,727 commits to main branch, last one a day ago
Get your documents ready for gen AI
Created
2024-07-09
431 commits to main branch, last one a day ago
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
Created
2022-09-26
1,711 commits to main branch, last one 2 days ago
Knowledge Agents and Management in the Cloud
Created
2024-01-31
263 commits to main branch, last one 16 hours ago
ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.
Created
2024-02-01
391 commits to main branch, last one a day ago
🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based
Created
2020-05-23
86 commits to master branch, last one 4 years ago
PDF text data extraction web app with OCR for scanned documents
Created
2022-05-13
46 commits to main branch, last one 10 months ago
A Python package for converting PDFs to markdown while extracting images and tables, generate descriptive text descriptions for extracted tables/images using several LLM clients. And many more functio...
Created
2024-12-24
57 commits to main branch, last one 7 days ago
Sample code for the Datalogics C++, Java, and .NET interfaces of the Adobe PDF Library
Created
2017-03-28
247 commits to master branch, last one about a year ago
OCR library to extract text & tables from PDF files and images. Convert any image or PDF to CSV / TXT / JSON / Searchable PDF.
Created
2022-08-04
27 commits to main branch, last one 2 years ago
cli for extracting text from PDF files (and maybe possibly tables)
Created
2020-09-28
89 commits to master branch, last one 13 days ago
C# and VB.NET samples for Docotic.Pdf library
Created
2017-12-13
559 commits to master branch, last one 29 days ago
A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GROBID, LangChain, listen as podcast. Customize your own pipelines...
Created
2023-03-31
109 commits to main branch, last one 17 days ago
The code base of the front-end of nocodefunctions.com
Created
2021-11-22
15 commits to main branch, last one 11 days ago
A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding
Created
2025-02-20
60 commits to main branch, last one a day ago