Search Results - RepositoryStats

2.4k

30.3k

agpl-3.0

149

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具，将PDF转换成Markdown和JSON格式。

ocr pdf parser python ai4science pdf-parser extract-data pdf-converter layout-analysis document-analysis pdf-extractor-llm pdf-extractor-rag pdf-extractor-pretrain

Created 2024-02-29

2,629 commits to master branch, last one 2 days ago

layout-parser Layout-Parser

493

5.2k

apache-2.0

74

A Unified Toolkit for Deep Learning Based Document Image Analysis

ocr detectron2 deep-learning layout-parser computer-vision layout-analysis layout-detection object-detection document-layout-analysis document-image-processing

Created 2020-06-10

182 commits to main branch, last one 2 years ago

Pix2Text breezedeus

216

2.3k

mit

20

An Open-Source Python3 tool with SMALL models for recognizing layouts, tables, math formulas (LaTeX), and text in images, converting them into Markdown format. A free alternative to Mathpix, empowerin...

ocr latex python mathpix pytorch math-ocr latex-pdf table-ocr math-formula layout-analysis image-to-markdown math-formula-recognition

Created 2022-09-07

481 commits to main branch, last one 3 months ago

PdfPig UglyToad

254

2.0k

apache-2.0

48

Read and extract text and other content from PDFs in C# (port of PDFBox)

pdf hocr csharp pdfbox alto-xml page-xml pdf-files netstandard pdf-document pdf-extractor pdf-generation layout-analysis document-analysis pdf-document-processor

Created 2017-11-09

1,644 commits to master branch, last one 4 days ago

kraken mittagessen

140

807

apache-2.0

27

OCR engine for all the languages

htr ocr hocr alto-xml page-xml layout-analysis neural-networks handwritten-text-recognition optical-character-recognition

Created 2015-05-19

2,193 commits to main branch, last one 10 days ago

DocumentLayoutAnalysis BobLd

67

607

unknown

34

Document Layout Analysis resources repos for development with PdfPig.

pdf tei alto hocr xycut csharp pdfpig xy-cut alto-xml docstrum page-xml hocr-documents layout-analysis recursive-xy-cut table-extraction page-segmentation document-layout-analysis

Created 2019-09-02

181 commits to master branch, last one about a year ago

yomitoku kotaro-kinoshita

17

576

unknown

5

Yomitoku is an AI-powered document image analysis package designed specifically for the Japanese language.

ocr python pytorch deep-learning layout-analysis

Created 2024-10-30

282 commits to main branch, last one 6 days ago

mindocr mindspore-lab

57

261

apache-2.0

13

A toolbox of ocr models and algorithms based on MindSpore

ocr crnn dbnet vary-toy layoutxlm mindspore tablemaster deep-learning text-detection layout-analysis ocr-large-model text-recognition table-recognition key-information-extraction

Created 2022-12-20

873 commits to main branch, last one 21 days ago