10 results found Sort:

186
1.6k
mit
44
node.js module for extracting text from html, pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf and more!
Created 2013-04-23
307 commits to master branch, last one 5 years ago
40
298
agpl-3.0
7
🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based
Created 2020-05-23
86 commits to master branch, last one 3 years ago
Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelin...
Created 2015-05-30
486 commits to master branch, last one 2 years ago
72
197
apache-2.0
23
Use the Java Tika text extraction library on the .NET platform
Created 2010-07-02
187 commits to master branch, last one 4 years ago
Multiple and Large PDF Documents Text Extraction.
Created 2020-05-07
41 commits to master branch, last one 9 months ago
16
93
unlicense
5
Extract text from plaintext, .docx, .odt and .rtf files. Pure go.
Created 2019-03-02
81 commits to master branch, last one about a year ago
4
59
unknown
4
R wrapper for antiword utility
Created 2017-04-22
70 commits to master branch, last one about a month ago
8
54
apache-2.0
7
R Interface to Apache Tika
Created 2018-01-19
179 commits to master branch, last one about a year ago
Build search across multiple documents client-side in your file storage
Created 2020-07-09
52 commits to master branch, last one about a year ago