7 results found Sort:

53
1.1k
apache-2.0
5
E2M converts various file types (doc, docx, epub, html, htm, url, pdf, ppt, pptx, mp3, m4a) into Markdown. It’s easy to install, with dedicated parsers and converters, supporting custom configs. E2M o...
Created 2024-08-04
190 commits to main branch, last one 7 months ago
Easily deployable and scalable backend server that efficiently converts various document formats (pdf, docx, pptx, html, images, etc) into Markdown. With support for both CPU and GPU processing, it is...
Created 2024-11-05
45 commits to main branch, last one about a month ago
Parse PDFs into markdown using Vision LLMs
Created 2024-12-16
112 commits to main branch, last one 2 months ago
3
86
apache-2.0
1
A Python package for converting PDFs to markdown while extracting images and tables, generate descriptive text descriptions for extracted tables/images using several LLM clients. And many more functio...
Created 2024-12-24
57 commits to main branch, last one 15 days ago
Conversion of PDF documents to structured Markdown, optimized for Retrieval Augmented Generation (RAG) and other NLP tasks. Extract text, tables, and images with preserved formatting for enhanced info...
Created 2024-09-10
26 commits to main branch, last one 4 months ago
smart-llm-loader is a lightweight yet powerful Python package that transforms any document into LLM-ready chunks. Spend less time on preprocessing headaches and more time building what matters. From R...
Created 2025-02-13
42 commits to main branch, last one about a month ago