10 results found Sort:
- Filter by Primary Language:
- Python (2)
- HTML (2)
- JavaScript (1)
- R (1)
- Rich Text Format (1)
- C (1)
- Visual Basic .NET (1)
- Go (1)
- +
node.js module for extracting text from html, pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf and more!
Created
2013-04-23
307 commits to master branch, last one 5 years ago
🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based
Created
2020-05-23
86 commits to master branch, last one 3 years ago
Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelin...
Created
2015-05-30
486 commits to master branch, last one 2 years ago
Use the Java Tika text extraction library on the .NET platform
Created
2010-07-02
187 commits to master branch, last one 4 years ago
Multiple and Large PDF Documents Text Extraction.
Created
2020-05-07
41 commits to master branch, last one 9 months ago
Extract text from plaintext, .docx, .odt and .rtf files. Pure go.
Created
2019-03-02
81 commits to master branch, last one about a year ago
C# and VB.NET samples for Docotic.Pdf library
Created
2017-12-13
555 commits to master branch, last one 15 days ago
R wrapper for antiword utility
Created
2017-04-22
70 commits to master branch, last one about a month ago
R Interface to Apache Tika
Created
2018-01-19
179 commits to master branch, last one about a year ago
Build search across multiple documents client-side in your file storage
Created
2020-07-09
52 commits to master branch, last one about a year ago