Statistics for topic scraping
RepositoryStats tracks 595,856 Github repositories, of these 357 are tagged with the scraping topic. The most common primary language for repositories using this topic is Python (190). Other languages include: TypeScript (35), JavaScript (34), Go (20), HTML (12), PHP (12)
Stargazers over time for topic scraping
Most starred repositories for topic scraping (view more)
Trending repositories for topic scraping (view more)
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
🕵️♂️ Collect a dossier on a person by username from thousands of sites
Auto_Jobs_Applier_AI_Agent aims to easy job hunt process by automating the job application process. Utilizing artificial intelligence, it enables users to apply for multiple jobs in an automated and p...
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...
Modern tests to detect automated browser behavior. Cover most important leaks from Puppeteer and Playwright.
Turn Webpage to LLM friendly input text. Similar to Jina Reader and Firecrawl API. Makes image & webpage links extraction easy for web scraping.
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on dema...
This Python application is an OSINT (Open Source Intelligence) tool called "Ominis OSINT - Web Hunter." It performs online information gathering by querying Google for search results related to a user...
BotBrowser modifies Chromium's native C++ core to emulate a real browser, bypassing advanced antibot systems like Shape, Cloudflare, DataDome, Akamai, Kasada, and similar CAPTCHA defenses.
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
🕵️♂️ Collect a dossier on a person by username from thousands of sites
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...
Auto_Jobs_Applier_AI_Agent aims to easy job hunt process by automating the job application process. Utilizing artificial intelligence, it enables users to apply for multiple jobs in an automated and p...
Modern tests to detect automated browser behavior. Cover most important leaks from Puppeteer and Playwright.
A script that can be used to capture various porn novels for machine learning / 一个可以用于抓取各类色情小说用于机器学习的脚本
BotBrowser modifies Chromium's native C++ core to emulate a real browser, bypassing advanced antibot systems like Shape, Cloudflare, DataDome, Akamai, Kasada, and similar CAPTCHA defenses.
AgentQL is an AI-powered query language for web scraping and automation. It uses natural language selectors to find data on any page, including authenticated content. AgentQL queries are self-healing ...
⬇️ A simple all-in-one CLI tool to download EVERYTHING from a URL (like youtube-dl/yt-dlp, forum-dl, gallery-dl, simpler ArchiveBox). 🎭 Uses headless Chrome to get HTML, JS, CSS, images/video/audio/s...
🕵️♂️ Collect a dossier on a person by username from thousands of sites
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Auto_Jobs_Applier_AI_Agent aims to easy job hunt process by automating the job application process. Utilizing artificial intelligence, it enables users to apply for multiple jobs in an automated and p...
⬇️ A simple all-in-one CLI tool to download EVERYTHING from a URL (like youtube-dl/yt-dlp, forum-dl, gallery-dl, simpler ArchiveBox). 🎭 Uses headless Chrome to get HTML, JS, CSS, images/video/audio/s...
Turn Webpage to LLM friendly input text. Similar to Jina Reader and Firecrawl API. Makes image & webpage links extraction easy for web scraping.
BotBrowser modifies Chromium's native C++ core to emulate a real browser, bypassing advanced antibot systems like Shape, Cloudflare, DataDome, Akamai, Kasada, and similar CAPTCHA defenses.
A script that can be used to capture various porn novels for machine learning / 一个可以用于抓取各类色情小说用于机器学习的脚本
Modern tests to detect automated browser behavior. Cover most important leaks from Puppeteer and Playwright.
Auto_Jobs_Applier_AI_Agent aims to easy job hunt process by automating the job application process. Utilizing artificial intelligence, it enables users to apply for multiple jobs in an automated and p...
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...
Auto_Jobs_Applier_AI_Agent aims to easy job hunt process by automating the job application process. Utilizing artificial intelligence, it enables users to apply for multiple jobs in an automated and p...
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on dema...
This Python application is an OSINT (Open Source Intelligence) tool called "Ominis OSINT - Web Hunter." It performs online information gathering by querying Google for search results related to a user...