Search Results - RepositoryStats

2.4k

29.9k

agpl-3.0

146

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具，将PDF转换成Markdown和JSON格式。

ocr pdf parser python ai4science pdf-parser extract-data pdf-converter layout-analysis document-analysis pdf-extractor-llm pdf-extractor-rag pdf-extractor-pretrain

Created 2024-02-29

2,586 commits to master branch, last one 3 days ago

pypdf py-pdf

1.4k

8.9k

other

146

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

pdf pypdf2 python pdf-parser help-wanted pdf-parsing pdf-documents pdf-manipulation

Created 2012-01-06

1,962 commits to main branch, last one 4 days ago

yft-design dromara

239

1.2k

mit

11

基于fabric.js的开源版【稿定设计】。一款美观且功能强大的在线设计工具，具备海报设计和图片编辑功能。适用于多种场景，如海报生成、电商产品图制作、文章长图设计、视频/公众号封面编辑等。A beautiful and powerful online design tool

clipper fabricjs psd-parse text2path image-crop pdf-editor pdf-parser psd-editor vue3-fabric element-plus canvas-editor fabric-editor online-design online-editor poster-design

Created 2023-05-25

917 commits to main branch, last one 22 days ago

extractous yobix-ai

43

1.0k

apache-2.0

14

Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.

etl llm nlp ocr pdf rag docx rust tika extraction pdf-parser unstructured etl-pipelines data-pipelines machine-learning unstructured-data natural-language-processing

Created 2024-06-04

305 commits to main branch, last one 3 months ago

marker-api adithya-s-k

93

832

gpl-3.0

6

Easily deployable 🚀 API to convert PDF to markdown quickly with high accuracy.

api marker fastapi rest-api pdf-files pdf-parser pdf-parsing pdf-converter

Created 2024-05-10

169 commits to master branch, last one 5 months ago

docling-api drmingler

52

492

mit

3

Easily deployable and scalable backend server that efficiently converts various document formats (pdf, docx, pptx, html, images, etc) into Markdown. With support for both CPU and GPU processing, it is...

api fastapi pdf-parser pdf-chatbot pdf-parsing pdf-converter pdf-conversion markdown-parser pdf-to-markdown

Created 2024-11-05

45 commits to main branch, last one about a month ago

scipdf_parser titipata

64

401

mit

7

Python PDF parser for scientific publications: content and figures

pdf grobid parser pdf-parser python-parser scipdf-parser

Created 2019-07-03

59 commits to master branch, last one about a year ago

vision-parse iamarunbrahma

46

335

mit

4

Parse PDFs into markdown using Vision LLMs

pdf-parser document-parser pdf-to-markdown text-extraction

Created 2024-12-16

112 commits to main branch, last one about a month ago

llmdocparser lazyFrogLOL

8

268

mit

3

A package for parsing PDFs and analyzing their content using LLMs.

llm nlp ocr rag chunking pdfparser pdf-parser text-chunking document-analysis

Created 2024-07-26

28 commits to main branch, last one 8 months ago

pdfalyzer michelcrypt4d4mus

19

259

gpl-3.0

6

Analyze PDFs. With colors. And Yara.

pdf pdf-format pdf-parser pdf-documents malware-analysis malicious-pdf-files

Created 2022-09-14

226 commits to master branch, last one 3 months ago

dedoc ispras

26

227

apache-2.0

12

Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic...

doc ocr odt pdf txt docx html excel documents pdf-parser docx-parser html-parser document-analysis scanned-documents table-of-contents table-recognition document-content-extraction logical-structure-extraction

Created 2020-12-07

297 commits to master branch, last one 3 months ago

sypht-python-client sypht-team

5

162

mit

4

A python client for the Sypht API

Created 2018-08-20

212 commits to master branch, last one about a year ago

casparser codereverser

67

142

mit

9

Parser for Consolidated Account Statements (CAS) generated from CAMS/Karvy/Kfintech

cas 112a cams karvy parser python3 kfintech pdf-parser capital-gain mutual-funds capital-gains mutual-fund-portfolio capital-gains-calculator consolidated-account-statements

Created 2020-10-10

265 commits to main branch, last one about a month ago

sypht-java-client sypht-team

1

87

apache-2.0

4

A Java client for the Sypht API

Created 2019-04-05

85 commits to master branch, last one 4 years ago

adobe-pdf-library-samples datalogics

62

82

unknown

26

Sample code for the Datalogics C++, Java, and .NET interfaces of the Adobe PDF Library

ocr pdf pdfa ocr-pdf pdf-lib pdf-split pdf-tools pdf-merger pdf-parser pdf-render pdf-to-text pdf-document pdf-to-image pdf-converter pdf-to-office pdf-conversion pdf-generation pdf-compression pdf-manipulation

Created 2017-03-28

247 commits to master branch, last one about a year ago

Docotic.Pdf.Samples BitMiracle

39

77

unknown

10

C# and VB.NET samples for Docotic.Pdf library

Created 2017-12-13

559 commits to master branch, last one about a month ago

smart-llm-loader drmingler

1

63

mit

1

smart-llm-loader is a lightweight yet powerful Python package that transforms any document into LLM-ready chunks. Spend less time on preprocessing headaches and more time building what matters. From R...

rag claude gemini openai chatbot chunking markdown langchain pdf-parser llama-index pdf-converter pdf-to-markdown

Created 2025-02-13

42 commits to main branch, last one about a month ago

nextjs-pdf-parser tuffstuff9

6

59

unknown

1

Next.js template for seamless PDF parsing using pdf2json and FilePond. Ideal for developers seeking a ready-to-use solution for PDF content extraction in Next.js projects.

nextjs filepond pdf2json pdf-parse react-pdf nextjs-pdf pdf-parser pdf-upload pdf-parsing nextjs-pdf-parse react-pdf-parser nextjs-pdf-parser content-extraction nextjs-pdf-parsing

Created 2023-08-03

10 commits to main branch, last one about a year ago