15 results found Sort:

783
2.5k
apache-2.0
98
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
Created 2009-05-21
9,116 commits to main branch, last one a day ago
298
1.4k
apache-2.0
73
Elasticsearch File System Crawler (FS Crawler)
Created 2012-06-08
2,950 commits to master branch, last one 6 hours ago
141
410
apache-2.0
45
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Created 2016-05-25
726 commits to main branch, last one about a year ago
31
241
mit
21
A cross-platform command line tool for parallelised content extraction and analysis.
Created 2015-05-07
804 commits to master branch, last one 2 months ago
74
199
apache-2.0
23
Use the Java Tika text extraction library on the .NET platform
Created 2010-07-02
187 commits to master branch, last one 4 years ago
33
154
apache-2.0
3
pdf2html is a module which helps to convert PDF file to HTML pages using Apache Tika. This module also helps to generate thumbnail image for PDF file using Apache PDFBox.
Created 2019-08-16
109 commits to master branch, last one 10 months ago
6
139
apache-2.0
6
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Created 2024-06-04
279 commits to main branch, last one 3 days ago
Code for Machine Learning with TensorFlow: 2nd Edition Published by Manning Publications
Created 2019-04-01
289 commits to master branch, last one 3 years ago
66
137
apache-2.0
16
Convenience Docker images for Apache Tika Server
Created 2020-01-08
127 commits to main branch, last one about a month ago
Apache Tika bindings for PHP: extract text and metadata from documents, images and other formats
Created 2015-08-30
360 commits to master branch, last one 5 months ago
Tika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features.
Created 2014-12-13
292 commits to master branch, last one about a year ago
Interactive Image similarity and Visual Search and Retrieval application
Created 2015-03-02
466 commits to master branch, last one about a year ago
8
55
apache-2.0
7
R Interface to Apache Tika
Created 2018-01-19
179 commits to master branch, last one about a year ago
14
55
apache-2.0
22
Quickly analyze and explore email with advanced analytics and visualization.
Created 2014-05-22
144 commits to update_ex_5.x branch, last one 4 years ago
23
52
apache-2.0
9
Extract and Visualize location from any file
Created 2015-08-04
373 commits to master branch, last one about a year ago