10 results found Sort:

185
1.6k
mit
44
node.js module for extracting text from html, pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf and more!
Created 2013-04-23
307 commits to master branch, last one 5 years ago
36
285
agpl-3.0
7
🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based
Created 2020-05-23
86 commits to master branch, last one 3 years ago
Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelin...
Created 2015-05-30
486 commits to master branch, last one about a year ago
74
195
apache-2.0
23
Use the Java Tika text extraction library on the .NET platform
Created 2010-07-02
187 commits to master branch, last one 4 years ago
Multiple and Large PDF Documents Text Extraction.
Created 2020-05-07
41 commits to master branch, last one 4 months ago
18
91
unlicense
5
Extract text from plaintext, .docx, .odt and .rtf files. Pure go.
Created 2019-03-02
81 commits to master branch, last one 8 months ago
4
56
unknown
4
R wrapper for antiword utility
Created 2017-04-22
62 commits to master branch, last one 2 months ago
8
54
apache-2.0
7
R Interface to Apache Tika
Created 2018-01-19
179 commits to master branch, last one about a year ago
Build search across multiple documents client-side in your file storage
Created 2020-07-09
52 commits to master branch, last one about a year ago