31 results found Sort:
- Filter by Primary Language:
- Python (12)
- Java (3)
- Ruby (3)
- Jupyter Notebook (2)
- C++ (2)
- Kotlin (2)
- Go (2)
- TypeScript (1)
- Cython (1)
- HTML (1)
- JavaScript (1)
- +
Extract Keywords from sentence or Replace keywords in sentences.
Created
2017-08-15
108 commits to master branch, last one 4 years ago
Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class (f...
Created
2015-10-11
49 commits to master branch, last one 2 years ago
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Created
2017-07-13
6,411 commits to develop branch, last one about a year ago
:newspaper: Let ChatGPT Summarize Hacker News for You
Created
2014-09-17
459 commits to master branch, last one 7 days ago
🚜 Parse text and tables from PDF files.
Created
2015-03-05
153 commits to master branch, last one 10 months ago
Pure Python, lightweight, Pillow-based solver for Amazon's text captcha.
Created
2020-05-12
542 commits to master branch, last one about a year ago
A powerful Python library for getting rich data from the Vietnam Stock Market using just a few lines of code
Created
2022-02-27
228 commits to main branch, last one 9 days ago
Wikipedia information extraction library
Created
2015-06-15
340 commits to master branch, last one about a year ago
A python client for the Sypht API
Created
2018-08-20
212 commits to master branch, last one 8 months ago
Benchmarking PDF libraries
Created
2022-05-08
57 commits to main branch, last one 7 months ago
A tool for scraping emails, social media accounts, and much more information from websites using Google Search Results.
Created
2023-07-06
16 commits to master branch, last one 2 months ago
This repository provides usage examples for the Python module Newspaper3k.
Created
2020-10-11
73 commits to main branch, last one 5 months ago
Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Created
2020-04-22
195 commits to main branch, last one 2 months ago
Structured HTML table data extraction from URLs in Go that has almost no external dependencies
Created
2022-09-17
19 commits to main branch, last one 3 months ago
A Python utility to digitize plots.
Created
2018-07-12
131 commits to master branch, last one 6 months ago
Superpipe - optimized LLM pipelines for structured data
Created
2024-02-07
96 commits to main branch, last one 23 days ago
High performance Trie and Ahocorasick automata (AC automata) Keyword Match & Replace Tool for python
Created
2019-02-21
38 commits to master branch, last one 10 months ago
Line segmentation algorithm for Google Vision API.
Created
2018-01-14
36 commits to master branch, last one about a year ago
A Java client for the Sypht API
Created
2019-04-05
85 commits to master branch, last one 3 years ago
Reduce HTML and XML to JSON from the command line, using an expressive query language inspired by CSS selectors.
Created
2020-07-20
43 commits to main branch, last one about a year ago
file metadata parsing, done cheap
Created
2017-12-08
418 commits to master branch, last one 8 months ago
DocWire SDK: Award-winning modern data processing in C++20. SourceForge Community Choice & Microsoft support. AI-driven processing. Supports nearly 100 data formats, including email boxes and OCR. Boo...
Created
2023-05-29
1,017 commits to master branch, last one 2 months ago
⚡️ Next-generation data transformation framework for TypeScript that puts developer experience first
Created
2022-03-23
66 commits to main branch, last one 2 years ago
Information extraction and interactive visualization of textual datasets for investigative data-driven journalism and eDiscovery
Created
2016-06-24
818 commits to master branch, last one 2 years ago
Domain-specific language for extracting structured data from HTML documents
Created
2016-03-03
1,562 commits to master branch, last one about a month ago
Refinery is a tool to extract and transform semi-structured data from Excel spreadsheets of different layouts in a declarative way.
Created
2021-11-01
142 commits to master branch, last one 10 months ago
GoScrapy: Harnessing Go's power for blazingly fast web scraping, inspired by Python's Scrapy framework.
Created
2023-07-27
268 commits to main branch, last one about a month ago
Extract receipt info
Created
2020-11-13
125 commits to master branch, last one about a year ago
Collection of data extracted from Minecraft.
Created
2021-06-18
8 commits to 1.20.4 branch, last one 4 months ago
Google maps scraper with gui
Created
2023-06-19
30 commits to main branch, last one about a month ago