15 results found Sort:

798
2.8k
apache-2.0
99
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
Created 2009-05-21
9,274 commits to main branch, last one a day ago
298
1.4k
apache-2.0
72
Elasticsearch File System Crawler (FS Crawler)
Created 2012-06-08
3,021 commits to master branch, last one 4 days ago
35
946
apache-2.0
12
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Created 2024-06-04
305 commits to main branch, last one about a month ago
141
412
apache-2.0
44
Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Created 2016-05-25
726 commits to main branch, last one about a year ago
31
242
mit
21
A cross-platform command line tool for parallelised content extraction and analysis.
Created 2015-05-07
806 commits to master branch, last one 2 months ago
76
204
apache-2.0
23
Use the Java Tika text extraction library on the .NET platform
Created 2010-07-02
187 commits to master branch, last one 5 years ago
34
161
apache-2.0
3
pdf2html is a module which helps to convert PDF file to HTML pages using Apache Tika. This module also helps to generate thumbnail image for PDF file using Apache PDFBox.
Created 2019-08-16
113 commits to master branch, last one 2 months ago
68
155
apache-2.0
16
Convenience Docker images for Apache Tika Server
Created 2020-01-08
128 commits to main branch, last one 16 days ago
Code for Machine Learning with TensorFlow: 2nd Edition Published by Manning Publications
Created 2019-04-01
289 commits to master branch, last one 4 years ago
Apache Tika bindings for PHP: extract text and metadata from documents, images and other formats
Created 2015-08-30
360 commits to master branch, last one 8 months ago
Tika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features.
Created 2014-12-13
292 commits to master branch, last one about a year ago
Interactive Image similarity and Visual Search and Retrieval application
Created 2015-03-02
466 commits to master branch, last one about a year ago
16
56
apache-2.0
22
Quickly analyze and explore email with advanced analytics and visualization.
Created 2014-05-22
144 commits to update_ex_5.x branch, last one 4 years ago
8
55
apache-2.0
7
R Interface to Apache Tika
Created 2018-01-19
179 commits to master branch, last one about a year ago
23
52
apache-2.0
9
Extract and Visualize location from any file
Created 2015-08-04
373 commits to master branch, last one about a year ago