31 results found Sort:

593
5.6k
mit
141
Extract Keywords from sentence or Replace keywords in sentences.
Created 2017-08-15
108 commits to master branch, last one 4 years ago
Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class (f...
Created 2015-10-11
49 commits to master branch, last one 2 years ago
233
1.4k
apache-2.0
38
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Created 2017-07-13
6,411 commits to develop branch, last one about a year ago
:newspaper: Let ChatGPT Summarize Hacker News for You
Created 2014-09-17
459 commits to master branch, last one 7 days ago
🚜 Parse text and tables from PDF files.
Created 2015-03-05
153 commits to master branch, last one 10 months ago
Pure Python, lightweight, Pillow-based solver for Amazon's text captcha.
Created 2020-05-12
542 commits to master branch, last one about a year ago
120
421
other
38
A powerful Python library for getting rich data from the Vietnam Stock Market using just a few lines of code
Created 2022-02-27
228 commits to main branch, last one 9 days ago
Wikipedia information extraction library
Created 2015-06-15
340 commits to master branch, last one about a year ago
9
162
bsd-3-clause
7
Benchmarking PDF libraries
Created 2022-05-08
57 commits to main branch, last one 7 months ago
A tool for scraping emails, social media accounts, and much more information from websites using Google Search Results.
Created 2023-07-06
16 commits to master branch, last one 2 months ago
This repository provides usage examples for the Python module Newspaper3k.
Created 2020-10-11
73 commits to main branch, last one 5 months ago
14
118
apache-2.0
7
Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Created 2020-04-22
195 commits to main branch, last one 2 months ago
Structured HTML table data extraction from URLs in Go that has almost no external dependencies
Created 2022-09-17
19 commits to main branch, last one 3 months ago
21
112
gpl-3.0
8
A Python utility to digitize plots.
Created 2018-07-12
131 commits to master branch, last one 6 months ago
Superpipe - optimized LLM pipelines for structured data
Created 2024-02-07
96 commits to main branch, last one 23 days ago
15
94
mit
5
High performance Trie and Ahocorasick automata (AC automata) Keyword Match & Replace Tool for python
Created 2019-02-21
38 commits to master branch, last one 10 months ago
Line segmentation algorithm for Google Vision API.
Created 2018-01-14
36 commits to master branch, last one about a year ago
Reduce HTML and XML to JSON from the command line, using an expressive query language inspired by CSS selectors.
Created 2020-07-20
43 commits to main branch, last one about a year ago
file metadata parsing, done cheap
Created 2017-12-08
418 commits to master branch, last one 8 months ago
12
54
other
5
DocWire SDK: Award-winning modern data processing in C++20. SourceForge Community Choice & Microsoft support. AI-driven processing. Supports nearly 100 data formats, including email boxes and OCR. Boo...
Created 2023-05-29
1,017 commits to master branch, last one 2 months ago
⚡️ Next-generation data transformation framework for TypeScript that puts developer experience first
Created 2022-03-23
66 commits to main branch, last one 2 years ago
15
52
agpl-3.0
27
Information extraction and interactive visualization of textual datasets for investigative data-driven journalism and eDiscovery
Created 2016-06-24
818 commits to master branch, last one 2 years ago
3
51
apache-2.0
4
Domain-specific language for extracting structured data from HTML documents
Created 2016-03-03
1,562 commits to master branch, last one about a month ago
Refinery is a tool to extract and transform semi-structured data from Excel spreadsheets of different layouts in a declarative way.
Created 2021-11-01
142 commits to master branch, last one 10 months ago
GoScrapy: Harnessing Go's power for blazingly fast web scraping, inspired by Python's Scrapy framework.
Created 2023-07-27
268 commits to main branch, last one about a month ago
Extract receipt info
Created 2020-11-13
125 commits to master branch, last one about a year ago
Collection of data extracted from Minecraft.
Created 2021-06-18
8 commits to 1.20.4 branch, last one 4 months ago