38 results found Sort:
- Filter by Primary Language:
- Python (17)
- Java (3)
- Ruby (3)
- TypeScript (2)
- Go (2)
- Jupyter Notebook (2)
- Kotlin (2)
- C++ (2)
- HTML (1)
- JavaScript (1)
- Cython (1)
- Scala (1)
- +
🔥 Open-source no-code web data extraction platform. Turn websites to APIs and spreadsheets with no-code robots in minutes! [In Beta]
Created
2023-10-23
3,894 commits to develop branch, last one 12 hours ago
Extract Keywords from sentence or Replace keywords in sentences.
Created
2017-08-15
108 commits to master branch, last one 4 years ago
Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
Created
2024-10-13
299 commits to main branch, last one 2 days ago
Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class (f...
Created
2015-10-11
49 commits to master branch, last one 3 years ago
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Created
2017-07-13
6,411 commits to develop branch, last one about a year ago
Lightweight library for scraping web-sites with LLMs
Created
2024-08-12
110 commits to main branch, last one 12 days ago
:newspaper: Let ChatGPT Summarize Hacker News for You
Created
2014-09-17
464 commits to master branch, last one 2 months ago
🚜 Parse text and tables from PDF files.
Created
2015-03-05
162 commits to master branch, last one 7 days ago
A powerful Python library for getting rich data from the Vietnam Stock Market using just a few lines of code
Created
2022-02-27
290 commits to main branch, last one about a month ago
Pure Python, lightweight, Pillow-based solver for Amazon's text captcha.
Created
2020-05-12
542 commits to master branch, last one about a year ago
Benchmarking PDF libraries
Created
2022-05-08
57 commits to main branch, last one about a year ago
Wikipedia information extraction library
Created
2015-06-15
340 commits to master branch, last one about a year ago
A tool for scraping emails, social media accounts, and much more information from websites using Google Search Results.
Created
2023-07-06
16 commits to master branch, last one 9 months ago
A python client for the Sypht API
Created
2018-08-20
212 commits to master branch, last one about a year ago
This repository provides usage examples for the Python module Newspaper3k.
Created
2020-10-11
73 commits to main branch, last one 11 months ago
Accurate, private and configurable document retrieval LLM
Created
2024-03-14
246 commits to main branch, last one 13 days ago
A Python utility to digitize plots.
Created
2018-07-12
157 commits to main branch, last one 4 months ago
Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Created
2020-04-22
195 commits to main branch, last one 9 months ago
Structured HTML table data extraction from URLs in Go that has almost no external dependencies
Created
2022-09-17
20 commits to main branch, last one about a month ago
Superpipe - optimized LLM pipelines for structured data
Created
2024-02-07
99 commits to main branch, last one 6 months ago
High performance Trie and Ahocorasick automata (AC automata) Keyword Match & Replace Tool for python. Correct case insensitive implementation!
Created
2019-02-21
38 commits to master branch, last one about a year ago
Line segmentation algorithm for Google Vision API.
Created
2018-01-14
36 commits to master branch, last one 2 years ago
GoScrapy: Harnessing Go's power for blazingly fast web scraping, inspired by Python's Scrapy framework.
Created
2023-07-27
303 commits to main branch, last one 8 days ago
A Java client for the Sypht API
Created
2019-04-05
85 commits to master branch, last one 4 years ago
DocWire SDK: Award-winning modern data processing in C++20. SourceForge Community Choice & Microsoft support. AI-driven processing. Supports nearly 100 data formats, including email boxes and OCR. Boo...
Created
2023-05-29
1,231 commits to master branch, last one 16 days ago
Reduce HTML and XML to JSON from the command line, using an expressive query language inspired by CSS selectors.
Created
2020-07-20
45 commits to main branch, last one 2 months ago
Google maps scraper with gui
Created
2023-06-19
38 commits to main branch, last one 3 months ago
file metadata parsing, done cheap
Created
2017-12-08
424 commits to master branch, last one 10 days ago
Information extraction and interactive visualization of textual datasets for investigative data-driven journalism and eDiscovery
Created
2016-06-24
818 commits to master branch, last one 3 years ago
⚡️ Next-generation data transformation framework for TypeScript that puts developer experience first
This repository has been archived
(exclude archived)
Created
2022-03-23
66 commits to main branch, last one 2 years ago