17 results found Sort:

1.0k
13.6k
agpl-3.0
74
A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。
Created 2024-02-29
1,654 commits to master branch, last one 20 hours ago
1.4k
8.3k
other
147
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
Created 2012-01-06
1,770 commits to main branch, last one 2 days ago
182
846
mit
11
一款美观且功能强大的在线设计工具,具备海报设计和图片编辑功能,基于fabric.js的开源版【稿定设计】。适用于多种场景,如海报生成、电商产品图制作、文章长图设计、视频/公众号封面编辑等 。A beautiful and powerful online design tool
Created 2023-05-25
896 commits to main branch, last one 23 hours ago
Easily deployable 🚀 API to convert PDF to markdown quickly with high accuracy.
Created 2024-05-10
169 commits to master branch, last one 22 days ago
Python PDF parser for scientific publications: content and figures
Created 2019-07-03
59 commits to master branch, last one 7 months ago
Analyze PDFs. With colors. And Yara.
Created 2022-09-14
216 commits to master branch, last one 11 days ago
A package for parsing PDFs and analyzing their content using LLMs.
Created 2024-07-26
28 commits to main branch, last one 3 months ago
19
173
apache-2.0
11
Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic...
Created 2020-12-07
295 commits to master branch, last one about a month ago
Parser for Consolidated Account Statements (CAS) generated from CAMS/Karvy/Kfintech
Created 2020-10-10
243 commits to main branch, last one 9 months ago
5
84
apache-2.0
4
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Created 2024-06-04
219 commits to main branch, last one 2 days ago
Sample code for the Datalogics C++, Java, and .NET interfaces of the Adobe PDF Library
Created 2017-03-28
247 commits to master branch, last one about a year ago
Next.js template for seamless PDF parsing using pdf2json and FilePond. Ideal for developers seeking a ready-to-use solution for PDF content extraction in Next.js projects.
Created 2023-08-03
10 commits to main branch, last one 11 months ago
2
47
apache-2.0
1
PDF parsing toolkit for preparing academic text corpus
Created 2023-09-03
4 commits to main branch, last one 3 months ago
Fast and memory-efficient Python PDF Parser based on xpdf sources
Created 2020-03-28
318 commits to dev branch, last one 2 years ago