41 results found Sort:

778
9.7k
agpl-3.0
68
🔥Open Source No Code Web Data Extraction Platform. Turn Websites To APIs & Spreadsheets With No-Code Robots In Minutes🔥
Created 2023-10-23
5,134 commits to develop branch, last one 2 days ago
603
5.6k
mit
141
Extract Keywords from sentence or Replace keywords in sentences.
Created 2017-08-15
108 commits to master branch, last one 4 years ago
186
2.8k
bsd-3-clause
25
🕷️ An undetectable, powerful, flexible, high-performance Python library that makes Web Scraping simple and easy again!
Created 2024-10-13
386 commits to main branch, last one 18 days ago
Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class (f...
Created 2015-10-11
49 commits to master branch, last one 3 years ago
234
1.5k
apache-2.0
36
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Created 2017-07-13
6,411 commits to develop branch, last one about a year ago
63
1.1k
gpl-2.0
16
Lightweight library for scraping web-sites with LLMs
Created 2024-08-12
125 commits to main branch, last one 16 days ago
:newspaper: Let ChatGPT Summarize Hacker News for You
Created 2014-09-17
467 commits to master branch, last one 3 days ago
168
699
other
48
A powerful Python library for getting rich data from the Vietnam Stock Market using just a few lines of code
Created 2022-02-27
320 commits to main branch, last one 8 days ago
🚜 Parse text and tables from PDF files.
Created 2015-03-05
163 commits to master branch, last one 2 months ago
Pure Python, lightweight, Pillow-based solver for Amazon's text captcha.
Created 2020-05-12
542 commits to master branch, last one about a year ago
15
269
bsd-3-clause
5
Benchmarking PDF libraries
Created 2022-05-08
57 commits to main branch, last one about a year ago
Undetected Web-Scraping & Seamless HTML Parsing in Python!
Created 2024-08-04
49 commits to main branch, last one 2 months ago
A tool for scraping emails, social media accounts, and much more information from websites using Google Search Results.
Created 2023-07-06
16 commits to master branch, last one about a year ago
Wikipedia information extraction library
Created 2015-06-15
340 commits to master branch, last one about a year ago
This repository provides usage examples for the Python module Newspaper3k.
Created 2020-10-11
73 commits to main branch, last one about a year ago
23
138
gpl-3.0
9
A Python utility to digitize plots.
Created 2018-07-12
157 commits to main branch, last one 7 months ago
16
123
apache-2.0
5
Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Created 2020-04-22
195 commits to main branch, last one about a year ago
11
121
unknown
3
Accurate, private and configurable document retrieval LLM
Created 2024-03-14
289 commits to main branch, last one 8 days ago
Structured HTML table data extraction from URLs in Go that has almost no external dependencies
Created 2022-09-17
20 commits to main branch, last one 4 months ago
A powerful Chrome extension for web scraping
Created 2024-12-06
98 commits to dev branch, last one 4 days ago
Superpipe - optimized LLM pipelines for structured data
Created 2024-02-07
99 commits to main branch, last one 9 months ago
Line segmentation algorithm for Google Vision API.
Created 2018-01-14
36 commits to master branch, last one 2 years ago
GoScrapy: Harnessing Go's power for blazingly fast web scraping, inspired by Python's Scrapy framework.
Created 2023-07-27
303 commits to main branch, last one 3 months ago
High performance Trie and Ahocorasick automata (AC automata) Keyword Match & Replace Tool for python. Correct case insensitive implementation!
Created 2019-02-21
38 commits to master branch, last one about a year ago
18
80
other
6
DocWire SDK: Award-winning modern data processing in C++20. SourceForge Community Choice & Microsoft support. AI-driven processing. Supports nearly 100 data formats, including email boxes and OCR. Boo...
Created 2023-05-29
1,351 commits to master branch, last one 10 days ago
Reduce HTML and XML to JSON from the command line, using an expressive query language inspired by CSS selectors.
Created 2020-07-20
45 commits to main branch, last one 6 months ago
file metadata parsing, done cheap
Created 2017-12-08
424 commits to master branch, last one 3 months ago