Search Results - RepositoryStats

pdfplumber jsvine

726

7.6k

mit

93

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.

pdf pdf-parsing table-extraction

Created 2015-08-24

754 commits to stable branch, last one 26 days ago

PyMuPDF pymupdf

594

7.0k

agpl-3.0

66

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

ocr pdf xps epub font mupdf python pymupdf tesseract data-science extract-data text-shaping pdf-documents text-processing table-extraction

Created 2012-10-06

2,848 commits to main branch, last one a day ago

table-transformer microsoft

282

2.6k

mit

38

Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS evalu...

table-detection table-extraction table-functional-analysis table-structure-recognition

Created 2021-05-17

195 commits to main branch, last one about a year ago

img2table xavctn

101

705

mit

10

img2table is a table identification and extraction Python Library for PDF and images, based on OpenCV image processing

opencv python image-processing table-extraction

Created 2022-03-21

166 commits to main branch, last one 2 months ago

DocumentLayoutAnalysis BobLd

67

612

unknown

34

Document Layout Analysis resources repos for development with PdfPig.

pdf tei alto hocr xycut csharp pdfpig xy-cut alto-xml docstrum page-xml hocr-documents layout-analysis recursive-xy-cut table-extraction page-segmentation document-layout-analysis

Created 2019-09-02

181 commits to master branch, last one about a year ago

ExtractTable-py ExtractTable

35

277

apache-2.0

6

Python library to extract tabular data from images and scanned PDFs

ocr extracttable tabular-data table-extraction pdf-table-extract image-table-recognition

Created 2019-10-06

53 commits to master branch, last one 2 years ago

Hyper-Table-OCR MrZilinXiao

45

178

unknown

2

A carefully-designed OCR pipeline for universal boarded table recognition and reconstruction.

ocr table-ocr ocr-python deep-learning table-extraction

Created 2020-12-02

37 commits to main branch, last one 2 years ago

awesome-table-structure-recognition MathamPollard

9

177

apache-2.0

9

A Curated List of Awesome Table Structure Recognition (TSR) Research. Including models, papers, datasets and codes. Continuously updating.

table-detection table-extraction document-understanding table-functional-analysis table-structure-recognition

Created 2023-12-15

50 commits to main branch, last one 7 months ago

docxtractr hrbrmstr

29

176

other

13

:scissors: Extract Tables from Microsoft Word Documents with R

r docx rstats extract-tables microsoft-word table-extraction

Created 2015-08-24

80 commits to master branch, last one 4 years ago

tabula-sharp BobLd

27

175

mit

9

Extract tables from PDF files (port of tabula-java)

pdfs table csharp dotnet pdfpig tabula extract pdfparser extraction netstandard tabula-java tabula-sharp extract-table table-extraction extracting-tables extraction-engine pdf-table-extract pdf-table-extraction

Created 2020-09-08

206 commits to master branch, last one about a month ago

PDFConverter houking-can

45

152

apache-2.0

3

Best PDF Converter! PDF to any format, pdf2word/excel/xml/html/txt...

docx pdf2img pdf2txt pdf2xls pdf2xml pdf2html pdf2word pdf2xlsx pdfconverter adobe-acrobat table-extraction

Created 2019-04-01

68 commits to master branch, last one 4 years ago

docext NanoNets

9

105

apache-2.0

1

An on-premises, OCR-free unstructured data extraction tool powered by vision language models.

nlp ocr rag llms vlms onprem llm-ocr document onpremise extraction onprem-ocr ocr-onpremise onprem-vision machine-learning table-extraction document-analysis unstructured-data document-data-extraction document-information-extraction

Created 2025-03-25

107 commits to main branch, last one 7 days ago

science-result-extractor IBM

17

91

apache-2.0

9

This repository has no description...

nlp ibm-research ibm-research-ai table-extraction scientific-papers information-extraction pdf-document-processor

This repository has been archived (exclude archived)

Created 2019-05-15

19 commits to master branch, last one 3 years ago

parsee-pdf-reader parsee-ai

6

58

mit

1

Parsee's PDF reader, specialized on the extraction of tables with numeric values and the accurate extraction and preservation of text-paragraphs. Full support for scans and images.

pdf pdf-document table-extraction

Created 2024-02-14

56 commits to master branch, last one 2 months ago

TableExtraction abdullahibneat

12

57

unknown

2

A line-based framework to detect and extract tabular data in JSON format from raster images using computer vision and Tesseract OCR.

opencv flask-api tesseract-ocr table-extraction

Created 2021-06-10

143 commits to main branch, last one about a year ago

Go5-Project phamquiluan

12

36

unknown

4

Extracting Tabular Data from Image to Excel files

excel-export image-processing table-extraction table-recognition

Created 2022-07-18

141 commits to main branch, last one 8 months ago

engineering-drawing-extractor Bakkopi

10

33

unknown

2

Automated data extraction from engineering blueprint images.

ocr opencv python openpyxl automation pytesseract image-analysis table-extraction digital-image-processing

Created 2023-08-26

4 commits to main branch, last one about a year ago

TableExtractor-Advanced-PDF-Table-Extraction Baskar-forever

6

28

mit

1

PDF Table Extractor is an innovative Python project designed to tackle the challenge of extracting tables from scanned PDF documents. Leveraging advanced optical character recognition (OCR) and image ...

ocr-python table-extraction scanedpdf-extraction table-extraction-python table-structure-recognition

Created 2024-03-20

2 commits to main branch, last one about a year ago