15 results found Sort:
- Filter by Primary Language:
- Java (4)
- JavaScript (4)
- Jupyter Notebook (1)
- PHP (1)
- Python (1)
- R (1)
- Rich Text Format (1)
- Rust (1)
- Shell (1)
- +
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
Created
2009-05-21
9,116 commits to main branch, last one a day ago
Elasticsearch File System Crawler (FS Crawler)
Created
2012-06-08
2,950 commits to master branch, last one 6 hours ago
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Created
2016-05-25
726 commits to main branch, last one about a year ago
A cross-platform command line tool for parallelised content extraction and analysis.
Created
2015-05-07
804 commits to master branch, last one 2 months ago
Use the Java Tika text extraction library on the .NET platform
Created
2010-07-02
187 commits to master branch, last one 4 years ago
pdf2html is a module which helps to convert PDF file to HTML pages using Apache Tika. This module also helps to generate thumbnail image for PDF file using Apache PDFBox.
Created
2019-08-16
109 commits to master branch, last one 10 months ago
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Created
2024-06-04
279 commits to main branch, last one 3 days ago
Code for Machine Learning with TensorFlow: 2nd Edition Published by Manning Publications
Created
2019-04-01
289 commits to master branch, last one 3 years ago
Convenience Docker images for Apache Tika Server
Created
2020-01-08
127 commits to main branch, last one about a month ago
Apache Tika bindings for PHP: extract text and metadata from documents, images and other formats
Created
2015-08-30
360 commits to master branch, last one 5 months ago
Tika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features.
Created
2014-12-13
292 commits to master branch, last one about a year ago
Interactive Image similarity and Visual Search and Retrieval application
Created
2015-03-02
466 commits to master branch, last one about a year ago
R Interface to Apache Tika
Created
2018-01-19
179 commits to master branch, last one about a year ago
Quickly analyze and explore email with advanced analytics and visualization.
Created
2014-05-22
144 commits to update_ex_5.x branch, last one 4 years ago
Extract and Visualize location from any file
Created
2015-08-04
373 commits to master branch, last one about a year ago