Statistics for topic web-scraping
RepositoryStats tracks 579,129 Github repositories, of these 216 are tagged with the web-scraping topic. The most common primary language for repositories using this topic is Python (100). Other languages include: JavaScript (20), Jupyter Notebook (17), Go (11)
Stargazers over time for topic web-scraping
Most starred repositories for topic web-scraping (view more)
Trending repositories for topic web-scraping (view more)
Smart Scraper: An AI-powered web scraping framework that uses headless browsers, asynchronous programming, and adaptive parsing to extract data efficiently from diverse websites. Includes a user-frien...
Free, open-source no-code web data extraction platform. Build custom robots to automate data scraping [In Beta]
The best and simplest free open source web page change detection, website watcher, restock monitor and notification service. Restock Monitor, change detection. Designed for simplicity - Simply monito...
Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.
🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...
Free, open-source no-code web data extraction platform. Build custom robots to automate data scraping [In Beta]
AgentQL is an AI-powered query language for web scraping and automation. It uses natural language selectors to find data on any page, including authenticated content. AgentQL queries are self-healing ...
Smart Scraper: An AI-powered web scraping framework that uses headless browsers, asynchronous programming, and adaptive parsing to extract data efficiently from diverse websites. Includes a user-frien...
Smart Scraper: An AI-powered web scraping framework that uses headless browsers, asynchronous programming, and adaptive parsing to extract data efficiently from diverse websites. Includes a user-frien...
Free, open-source no-code web data extraction platform. Build custom robots to automate data scraping [In Beta]
The best and simplest free open source web page change detection, website watcher, restock monitor and notification service. Restock Monitor, change detection. Designed for simplicity - Simply monito...
Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.
🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...
Free, open-source no-code web data extraction platform. Build custom robots to automate data scraping [In Beta]
A drop-in replacement for playwright-python patched with rebrowser-patches. It allows to pass modern automation detection tests.
AgentQL is an AI-powered query language for web scraping and automation. It uses natural language selectors to find data on any page, including authenticated content. AgentQL queries are self-healing ...
Open source implementation of Sova - RAG-based Web search engine using power of LLMs. Using Langchain, Ollama, HuggingFace Embeddings and scraping google search results.
Modern tests to detect automated browser behavior. Cover most important leaks from Puppeteer and Playwright.
GroqCrawl is a powerful and user-friendly web crawling and scraping application built with Streamlit and powered by PocketGroq. It provides an intuitive interface for extracting LLM friendly AI consum...
Smart Scraper: An AI-powered web scraping framework that uses headless browsers, asynchronous programming, and adaptive parsing to extract data efficiently from diverse websites. Includes a user-frien...
Free, open-source no-code web data extraction platform. Build custom robots to automate data scraping [In Beta]
The best and simplest free open source web page change detection, website watcher, restock monitor and notification service. Restock Monitor, change detection. Designed for simplicity - Simply monito...
Scrapy, a fast high-level web crawling & scraping framework for Python.
🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...
Free, open-source no-code web data extraction platform. Build custom robots to automate data scraping [In Beta]
GroqCrawl is a powerful and user-friendly web crawling and scraping application built with Streamlit and powered by PocketGroq. It provides an intuitive interface for extracting LLM friendly AI consum...
A drop-in replacement for playwright-python patched with rebrowser-patches. It allows to pass modern automation detection tests.
AgentQL is an AI-powered query language for web scraping and automation. It uses natural language selectors to find data on any page, including authenticated content. AgentQL queries are self-healing ...
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...
A guide for extracting titles, authors, and citations from Google Scholar using Python and Oxylabs SERP Scraper API.
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on dema...
AI agent that can SEE 👁️, control, navigate, & do stuff for you on your browser.
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...
The best and simplest free open source web page change detection, website watcher, restock monitor and notification service. Restock Monitor, change detection. Designed for simplicity - Simply monito...
🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...
Scrapy, a fast high-level web crawling & scraping framework for Python.
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on dema...
AI agent that can SEE 👁️, control, navigate, & do stuff for you on your browser.
PocketGroq is a powerful Python library that simplifies integration with the Groq API, offering advanced features for natural language processing, web scraping, and autonomous agent capabilities. Key ...
Python quick start guides to get the most out of Oxylabs' Web Scraper API free trial.
A huge collection of awesome beginner-friendly Python projects starting from very basics to advance. Prefect repository for learning python and enhancing your python programming skills.