34 results found Sort:
- Filter by Primary Language:
- Python (16)
- Java (3)
- Ruby (3)
- Kotlin (2)
- C++ (2)
- Go (2)
- TypeScript (1)
- Cython (1)
- HTML (1)
- JavaScript (1)
- Jupyter Notebook (1)
- +
Extract Keywords from sentence or Replace keywords in sentences.
Created
2017-08-15
108 commits to master branch, last one 4 years ago
Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class (f...
Created
2015-10-11
49 commits to master branch, last one 3 years ago
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Created
2017-07-13
6,411 commits to develop branch, last one about a year ago
Lightweight library for scraping web-sites with LLMs
Created
2024-08-12
96 commits to main branch, last one 14 hours ago
:newspaper: Let ChatGPT Summarize Hacker News for You
Created
2014-09-17
464 commits to master branch, last one 29 days ago
🚜 Parse text and tables from PDF files.
Created
2015-03-05
161 commits to master branch, last one 3 days ago
A powerful Python library for getting rich data from the Vietnam Stock Market using just a few lines of code
Created
2022-02-27
289 commits to main branch, last one 5 days ago
Pure Python, lightweight, Pillow-based solver for Amazon's text captcha.
Created
2020-05-12
542 commits to master branch, last one about a year ago
Benchmarking PDF libraries
Created
2022-05-08
57 commits to main branch, last one about a year ago
Wikipedia information extraction library
Created
2015-06-15
340 commits to master branch, last one about a year ago
A tool for scraping emails, social media accounts, and much more information from websites using Google Search Results.
Created
2023-07-06
16 commits to master branch, last one 7 months ago
A python client for the Sypht API
Created
2018-08-20
212 commits to master branch, last one about a year ago
This repository provides usage examples for the Python module Newspaper3k.
Created
2020-10-11
73 commits to main branch, last one 10 months ago
Accurate, private and configurable document retrieval LLM
Created
2024-03-14
204 commits to main branch, last one 3 days ago
A Python utility to digitize plots.
Created
2018-07-12
157 commits to main branch, last one 2 months ago
Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Created
2020-04-22
195 commits to main branch, last one 7 months ago
Structured HTML table data extraction from URLs in Go that has almost no external dependencies
Created
2022-09-17
19 commits to main branch, last one 8 months ago
Superpipe - optimized LLM pipelines for structured data
Created
2024-02-07
99 commits to main branch, last one 4 months ago
Line segmentation algorithm for Google Vision API.
Created
2018-01-14
36 commits to master branch, last one 2 years ago
High performance Trie and Ahocorasick automata (AC automata) Keyword Match & Replace Tool for python. Correct case insensitive implementation!
Created
2019-02-21
38 commits to master branch, last one about a year ago
A Java client for the Sypht API
Created
2019-04-05
85 commits to master branch, last one 4 years ago
GoScrapy: Harnessing Go's power for blazingly fast web scraping, inspired by Python's Scrapy framework.
Created
2023-07-27
296 commits to main branch, last one about a month ago
Reduce HTML and XML to JSON from the command line, using an expressive query language inspired by CSS selectors.
Created
2020-07-20
45 commits to main branch, last one about a month ago
DocWire SDK: Award-winning modern data processing in C++20. SourceForge Community Choice & Microsoft support. AI-driven processing. Supports nearly 100 data formats, including email boxes and OCR. Boo...
Created
2023-05-29
1,188 commits to master branch, last one 22 days ago
file metadata parsing, done cheap
Created
2017-12-08
423 commits to master branch, last one about a month ago
Google maps scraper with gui
Created
2023-06-19
38 commits to main branch, last one 2 months ago
⚡️ Next-generation data transformation framework for TypeScript that puts developer experience first
This repository has been archived
(exclude archived)
Created
2022-03-23
66 commits to main branch, last one 2 years ago
Information extraction and interactive visualization of textual datasets for investigative data-driven journalism and eDiscovery
Created
2016-06-24
818 commits to master branch, last one 2 years ago
Domain-specific language for extracting structured data from HTML documents
Created
2016-03-03
1,598 commits to master branch, last one 3 days ago
Refinery is a tool to extract and transform semi-structured data from Excel spreadsheets of different layouts in a declarative way.
Created
2021-11-01
142 commits to master branch, last one about a year ago