41 results found Sort:
- Filter by Primary Language:
- Python (18)
- C++ (3)
- Java (3)
- Ruby (3)
- TypeScript (3)
- Go (2)
- Kotlin (2)
- Jupyter Notebook (2)
- Scala (1)
- Cython (1)
- JavaScript (1)
- HTML (1)
- +
🔥Open Source No Code Web Data Extraction Platform. Turn Websites To APIs & Spreadsheets With No-Code Robots In Minutes🔥
Created
2023-10-23
5,134 commits to develop branch, last one 2 days ago
Extract Keywords from sentence or Replace keywords in sentences.
Created
2017-08-15
108 commits to master branch, last one 4 years ago
🕷️ An undetectable, powerful, flexible, high-performance Python library that makes Web Scraping simple and easy again!
Created
2024-10-13
386 commits to main branch, last one 18 days ago
Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class (f...
Created
2015-10-11
49 commits to master branch, last one 3 years ago
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Created
2017-07-13
6,411 commits to develop branch, last one about a year ago
Lightweight library for scraping web-sites with LLMs
Created
2024-08-12
125 commits to main branch, last one 16 days ago
:newspaper: Let ChatGPT Summarize Hacker News for You
Created
2014-09-17
467 commits to master branch, last one 3 days ago
A powerful Python library for getting rich data from the Vietnam Stock Market using just a few lines of code
Created
2022-02-27
320 commits to main branch, last one 8 days ago
🚜 Parse text and tables from PDF files.
Created
2015-03-05
163 commits to master branch, last one 2 months ago
Pure Python, lightweight, Pillow-based solver for Amazon's text captcha.
Created
2020-05-12
542 commits to master branch, last one about a year ago
Benchmarking PDF libraries
Created
2022-05-08
57 commits to main branch, last one about a year ago
Undetected Web-Scraping & Seamless HTML Parsing in Python!
Created
2024-08-04
49 commits to main branch, last one 2 months ago
A tool for scraping emails, social media accounts, and much more information from websites using Google Search Results.
Created
2023-07-06
16 commits to master branch, last one about a year ago
Wikipedia information extraction library
Created
2015-06-15
340 commits to master branch, last one about a year ago
A python client for the Sypht API
Created
2018-08-20
212 commits to master branch, last one about a year ago
This repository provides usage examples for the Python module Newspaper3k.
Created
2020-10-11
73 commits to main branch, last one about a year ago
A Python utility to digitize plots.
Created
2018-07-12
157 commits to main branch, last one 7 months ago
Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Created
2020-04-22
195 commits to main branch, last one about a year ago
Accurate, private and configurable document retrieval LLM
Created
2024-03-14
289 commits to main branch, last one 8 days ago
Structured HTML table data extraction from URLs in Go that has almost no external dependencies
Created
2022-09-17
20 commits to main branch, last one 4 months ago
A powerful Chrome extension for web scraping
Created
2024-12-06
98 commits to dev branch, last one 4 days ago
Superpipe - optimized LLM pipelines for structured data
Created
2024-02-07
99 commits to main branch, last one 9 months ago
Line segmentation algorithm for Google Vision API.
Created
2018-01-14
36 commits to master branch, last one 2 years ago
GoScrapy: Harnessing Go's power for blazingly fast web scraping, inspired by Python's Scrapy framework.
Created
2023-07-27
303 commits to main branch, last one 3 months ago
High performance Trie and Ahocorasick automata (AC automata) Keyword Match & Replace Tool for python. Correct case insensitive implementation!
Created
2019-02-21
38 commits to master branch, last one about a year ago
A Java client for the Sypht API
Created
2019-04-05
85 commits to master branch, last one 4 years ago
DocWire SDK: Award-winning modern data processing in C++20. SourceForge Community Choice & Microsoft support. AI-driven processing. Supports nearly 100 data formats, including email boxes and OCR. Boo...
Created
2023-05-29
1,351 commits to master branch, last one 10 days ago
Google maps scraper with gui
Created
2023-06-19
50 commits to main branch, last one 2 months ago
Reduce HTML and XML to JSON from the command line, using an expressive query language inspired by CSS selectors.
Created
2020-07-20
45 commits to main branch, last one 6 months ago