Search Results - RepositoryStats

pypdf py-pdf

1.5k

9.0k

other

147

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

pdf pypdf2 python pdf-parser help-wanted pdf-parsing pdf-documents pdf-manipulation

Created 2012-01-06

1,976 commits to main branch, last one 5 days ago

pdfplumber jsvine

726

7.6k

mit

93

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.

pdf pdf-parsing table-extraction

Created 2015-08-24

754 commits to stable branch, last one 25 days ago

HummusJS galkahana

169

1.2k

other

31

Node.js module for high performance creation, modification and parsing of PDF files and streams

nodejs pdf-parsing pdf-generation pdf-manipulation pdf-modification

Created 2013-03-22

602 commits to master branch, last one 7 months ago

marker-api adithya-s-k

93

839

gpl-3.0

6

Easily deployable 🚀 API to convert PDF to markdown quickly with high accuracy.

api marker fastapi rest-api pdf-files pdf-parser pdf-parsing pdf-converter

Created 2024-05-10

169 commits to master branch, last one 6 months ago

docling-api drmingler

56

543

mit

4

Easily deployable and scalable backend server that efficiently converts various document formats (pdf, docx, pptx, html, images, etc) into Markdown. With support for both CPU and GPU processing, it is...

api fastapi pdf-parser pdf-chatbot pdf-parsing pdf-converter pdf-conversion markdown-parser pdf-to-markdown

Created 2024-11-05

45 commits to main branch, last one about a month ago

py-pdf-parser jstockwin

46

402

mit

6

A Python tool to help extracting information from structured PDFs.

pdf parsing pdf-parsing py-pdf-parser

Created 2019-10-31

632 commits to master branch, last one 9 months ago

hummusRecipe chunyenHuang

91

346

mit

9

A powerful PDF tool for NodeJS based on HummusJS.

pdf nodejs pdf-files overlay-pdf pdf-parsing pdf-generation pdf-manipulation pdf-modification

Created 2017-07-18

398 commits to master branch, last one 3 years ago

traprange thoqbk

133

332

mit

33

(Java)A Method to Extract Tabular Content from PDF Files

pdf java parser pdfbox pdf-files pdf-parsing pdf-manipulation

Created 2014-09-08

54 commits to master branch, last one 2 years ago

pdf_parsing ck-unifr

32

195

unknown

2

PDF解析（文字，章节，表格，图片，参考），基于大模型(ChatGLM2-6B, RWKV)+langchain+streamlit的PDF问答，摘要，信息抽取

llm pdf rwkv python chatpdf langchain streamlit chatglm2-6b pdf-parsing information-extraction

Created 2023-09-08

42 commits to main branch, last one about a year ago

pdf-extractor ScientaNL

22

97

mit

9

Node.js module for rendering pdf pages to images, svgs, html files, text files and json metadata

pdfjs nodejs pdf-parsing html-generation image-generation

Created 2017-11-29

84 commits to master branch, last one about a year ago

pdf-to-markdown iamarunbrahma

7

73

mit

3

Conversion of PDF documents to structured Markdown, optimized for Retrieval Augmented Generation (RAG) and other NLP tasks. Extract text, tables, and images with preserved formatting for enhanced info...

rag python pdf-parsing pdf-converter pdf-extraction pdf-to-markdown text-extraction document-conversion document-processing information-retrieval retrieval-augmented-generation

Created 2024-09-10

26 commits to main branch, last one 5 months ago

pdf-table rostrovsky

13

72

mit

7

Java utility for parsing PDF tabular data using Apache PDFBox and OpenCV

java8 table opencv pdfbox tables opencv3 pdf-parsing java-library

Created 2017-02-19

61 commits to master branch, last one 3 years ago

nextjs-pdf-parser tuffstuff9

6

59

unknown

1

Next.js template for seamless PDF parsing using pdf2json and FilePond. Ideal for developers seeking a ready-to-use solution for PDF content extraction in Next.js projects.

nextjs filepond pdf2json pdf-parse react-pdf nextjs-pdf pdf-parser pdf-upload pdf-parsing nextjs-pdf-parse react-pdf-parser nextjs-pdf-parser content-extraction nextjs-pdf-parsing

Created 2023-08-03

10 commits to main branch, last one about a year ago