Trending repositories for topic web-scraping
🔥 Open Source No Code Web Data Extraction Platform. Turn Websites To APIs & Spreadsheets With No-Code Robots In Minutes 🔥
Claim Free proxy list with United States IP addresses and use it for your projects.
Official Firecrawl MCP Server - Adds powerful web scraping to Cursor, Claude and any other LLM clients.
The best and simplest free open source web page change detection, website watcher, restock monitor and notification service. Restock Monitor, change detection. Designed for simplicity - Simply monito...
🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。
Scrapy, a fast high-level web crawling & scraping framework for Python.
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...
Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...
Undetected version of the Playwright testing and automation library.
Python APIs for web automation, testing, and bypassing bot-detection.
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
Undetected Python version of the Playwright testing and automation library.
AgentQL is a suite of tools for connecting your AI to the web. Featuring a query language and Playwright integrations for interacting with elements and extracting data quickly, precisely, and at scale...
Free Trial Amazon Scraper API for extracting search, product, offer listing, reviews, question and answers, best sellers and sellers data.
🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!
🪞PRIMP (Python Requests IMPersonate). The fastest python HTTP client that can impersonate web browsers
Claim Free proxy list with United States IP addresses and use it for your projects.
Automated Deep Research with LLMs, web search, paper parsing, and didactic summarization.
Official Firecrawl MCP Server - Adds powerful web scraping to Cursor, Claude and any other LLM clients.
Undetected version of the Playwright testing and automation library.
Model Context Protocol (MCP) Server for Graphlit Platform
🔥 Open Source No Code Web Data Extraction Platform. Turn Websites To APIs & Spreadsheets With No-Code Robots In Minutes 🔥
Undetected Python version of the Playwright testing and automation library.
Computational Thinking for Social Scientists book project
🪞PRIMP (Python Requests IMPersonate). The fastest python HTTP client that can impersonate web browsers
Fully automated and hands-free, accurately extracting and understanding web content — powered by machine learning agents.
A Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.
AgentQL is a suite of tools for connecting your AI to the web. Featuring a query language and Playwright integrations for interacting with elements and extracting data quickly, precisely, and at scale...
A code for extracting best-selling items, search results, and currently available deals from Amazon using Python and Oxylabs E-Commerce Scraper API.
Fetch user's data across social media
Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.
Undetected NodeJS version of the Playwright testing and automation library.
A Playwright-based Node.js tool that bypasses search engine anti-scraping mechanisms to execute Google searches. Local alternative to SERP APIs with MCP server integration.
A command-line utility and Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.
🔥 Open Source No Code Web Data Extraction Platform. Turn Websites To APIs & Spreadsheets With No-Code Robots In Minutes 🔥
Official Firecrawl MCP Server - Adds powerful web scraping to Cursor, Claude and any other LLM clients.
Claim Free proxy list with United States IP addresses and use it for your projects.
The best and simplest free open source web page change detection, website watcher, restock monitor and notification service. Restock Monitor, change detection. Designed for simplicity - Simply monito...
🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...
Scrapy, a fast high-level web crawling & scraping framework for Python.
Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...
Python APIs for web automation, testing, and bypassing bot-detection.
Undetected version of the Playwright testing and automation library.
🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!
Free Trial Amazon Scraper API for extracting search, product, offer listing, reviews, question and answers, best sellers and sellers data.
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
A code for extracting best-selling items, search results, and currently available deals from Amazon using Python and Oxylabs E-Commerce Scraper API.
🪞PRIMP (Python Requests IMPersonate). The fastest python HTTP client that can impersonate web browsers
AgentQL is a suite of tools for connecting your AI to the web. Featuring a query language and Playwright integrations for interacting with elements and extracting data quickly, precisely, and at scale...
Claim Free proxy list with United States IP addresses and use it for your projects.
Automated Deep Research with LLMs, web search, paper parsing, and didactic summarization.
Model Context Protocol server that integrates AgentQL's data extraction capabilities.
Official Firecrawl MCP Server - Adds powerful web scraping to Cursor, Claude and any other LLM clients.
Model Context Protocol (MCP) Server for Graphlit Platform
🔥 Open Source No Code Web Data Extraction Platform. Turn Websites To APIs & Spreadsheets With No-Code Robots In Minutes 🔥
🪞PRIMP (Python Requests IMPersonate). The fastest python HTTP client that can impersonate web browsers
Undetected version of the Playwright testing and automation library.
Provides a list of fresh, working proxy servers (HTTP, HTTPS, SOCKS4 & SOCKS5) with multiple formats available for download.
A Playwright-based Node.js tool that bypasses search engine anti-scraping mechanisms to execute Google searches. Local alternative to SERP APIs with MCP server integration.
Undetected Python version of the Playwright testing and automation library.
A code for extracting best-selling items, search results, and currently available deals from Amazon using Python and Oxylabs E-Commerce Scraper API.
Undetected NodeJS version of the Playwright testing and automation library.
Fully automated and hands-free, accurately extracting and understanding web content — powered by machine learning agents.
AgentQL is a suite of tools for connecting your AI to the web. Featuring a query language and Playwright integrations for interacting with elements and extracting data quickly, precisely, and at scale...
Computational Thinking for Social Scientists book project
Automated Deep Research with LLMs, web search, paper parsing, and didactic summarization.
🔥 Open Source No Code Web Data Extraction Platform. Turn Websites To APIs & Spreadsheets With No-Code Robots In Minutes 🔥
Official Firecrawl MCP Server - Adds powerful web scraping to Cursor, Claude and any other LLM clients.
The best and simplest free open source web page change detection, website watcher, restock monitor and notification service. Restock Monitor, change detection. Designed for simplicity - Simply monito...
Claim Free proxy list with United States IP addresses and use it for your projects.
🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。
Scrapy, a fast high-level web crawling & scraping framework for Python.
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...
Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.
Python APIs for web automation, testing, and bypassing bot-detection.
🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!
Undetected version of the Playwright testing and automation library.
Model Context Protocol (MCP) Server for Graphlit Platform
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
Undetected Python version of the Playwright testing and automation library.
AgentQL is a suite of tools for connecting your AI to the web. Featuring a query language and Playwright integrations for interacting with elements and extracting data quickly, precisely, and at scale...
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on dema...
Model Context Protocol server that integrates AgentQL's data extraction capabilities.
Claim Free proxy list with United States IP addresses and use it for your projects.
Model Context Protocol (MCP) Server for Graphlit Platform
Official Firecrawl MCP Server - Adds powerful web scraping to Cursor, Claude and any other LLM clients.
Undetected NodeJS version of the Playwright testing and automation library.
🔥 Open Source No Code Web Data Extraction Platform. Turn Websites To APIs & Spreadsheets With No-Code Robots In Minutes 🔥
Undetected version of the Playwright testing and automation library.
A Playwright-based Node.js tool that bypasses search engine anti-scraping mechanisms to execute Google searches. Local alternative to SERP APIs with MCP server integration.
Undetected Python version of the Playwright testing and automation library.
Web scraper with a simple REST API living in Docker and using a Headless browser and Readability.js for parsing.
Provides a list of fresh, working proxy servers (HTTP, HTTPS, SOCKS4 & SOCKS5) with multiple formats available for download.
AgentQL is a suite of tools for connecting your AI to the web. Featuring a query language and Playwright integrations for interacting with elements and extracting data quickly, precisely, and at scale...
🪞PRIMP (Python Requests IMPersonate). The fastest python HTTP client that can impersonate web browsers
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on dema...
Modern tests to detect automated browser behavior. Cover most important leaks from Puppeteer and Playwright.
🔍 Model Context Protocol (MCP) tool for parsing websites using the Jina.ai Reader
🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!
Official Firecrawl MCP Server - Adds powerful web scraping to Cursor, Claude and any other LLM clients.
Claim Free proxy list with United States IP addresses and use it for your projects.
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on dema...
Undetected Python version of the Playwright testing and automation library.
A Playwright-based Node.js tool that bypasses search engine anti-scraping mechanisms to execute Google searches. Local alternative to SERP APIs with MCP server integration.
Undetected NodeJS version of the Playwright testing and automation library.
PocketGroq is a powerful Python library that simplifies integration with the Groq API, offering advanced features for natural language processing, web scraping, and autonomous agent capabilities. Key ...
Save web pages as Safari webarchive files from the command line
This tutorial shows how to automate your web scraping processes using AutoScaper – one of Python web scraping libraries available.
A comprehensive tutorial with real code samples to learn how to bypass CAPTCHA with Puppeteer.
Learn to create ChatGPT prompts that generate a web scraping code with proper CSS selectors.
🔥 Open Source No Code Web Data Extraction Platform. Turn Websites To APIs & Spreadsheets With No-Code Robots In Minutes 🔥
The best and simplest free open source web page change detection, website watcher, restock monitor and notification service. Restock Monitor, change detection. Designed for simplicity - Simply monito...
Python APIs for web automation, testing, and bypassing bot-detection.
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...
🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。
Scrapy, a fast high-level web crawling & scraping framework for Python.
🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!
Official Firecrawl MCP Server - Adds powerful web scraping to Cursor, Claude and any other LLM clients.
Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser tls/ja3/http2 fingerprints.
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
Free Trial Amazon Scraper API for extracting search, product, offer listing, reviews, question and answers, best sellers and sellers data.
HTTP(S)/SOCKS5 rotating residential proxies - code examples & general information.
Claim Free proxy list with United States IP addresses and use it for your projects.
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
AgentQL is a suite of tools for connecting your AI to the web. Featuring a query language and Playwright integrations for interacting with elements and extracting data quickly, precisely, and at scale...
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on dema...
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on dema...
A guide for extracting titles, authors, and citations from Google Scholar using Python and Oxylabs SERP Scraper API.
AgentQL is a suite of tools for connecting your AI to the web. Featuring a query language and Playwright integrations for interacting with elements and extracting data quickly, precisely, and at scale...
🪞PRIMP (Python Requests IMPersonate). The fastest python HTTP client that can impersonate web browsers
PocketGroq is a powerful Python library that simplifies integration with the Groq API, offering advanced features for natural language processing, web scraping, and autonomous agent capabilities. Key ...
A tutorial for collecting job postings from Indeed using Python and Oxylabs Web Scraper API.
A comprehensive tutorial with real code samples to learn how to bypass CAPTCHA with Puppeteer.
This tutorial shows how to automate your web scraping processes using AutoScaper – one of Python web scraping libraries available.
Learn to create ChatGPT prompts that generate a web scraping code with proper CSS selectors.
Undetected Web-Scraping & Seamless HTML Parsing in Python!
GroqCrawl is a powerful and user-friendly web crawling and scraping application built with Streamlit and powered by PocketGroq. It provides an intuitive interface for extracting LLM friendly AI consum...
Automated Aviator Betting Bot for Betika, Spribe & Other Aviator-style sites 🎮✈️ | Node.js + Puppeteer | Smart bankroll & Martingale strategies | Real-time analytics
Model Context Protocol (MCP) Server for Graphlit Platform
Undetected version of the Playwright testing and automation library.