Trending repositories for topic scraping
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
🕵️♂️ Collect a dossier on a person by username from thousands of sites
Auto_Jobs_Applier_AI_Agent aims to easy job hunt process by automating the job application process. Utilizing artificial intelligence, it enables users to apply for multiple jobs in an automated and p...
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...
Scrapy, a fast high-level web crawling & scraping framework for Python.
Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
This Python application is an OSINT (Open Source Intelligence) tool called "Ominis OSINT - Web Hunter." It performs online information gathering by querying Google for search results related to a user...
AgentQL is an AI-powered query language for web scraping and automation. It uses natural language selectors to find data on any page, including authenticated content. AgentQL queries are self-healing ...
Twitter API Scraper | Without an API key | Twitter Internal API | Free | Twitter scraper | Twitter Bot
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on dema...
OSINT cheat sheet, list OSINT tools, wiki, dataset, article, book , red team OSINT and OSINT tips
Tabula is a tool for liberating data tables trapped inside PDF files
A command-line utility for taking automated screenshots of websites
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
Turn Webpage to LLM friendly input text. Similar to Jina Reader and Firecrawl API. Makes image & webpage links extraction easy for web scraping.
This Python application is an OSINT (Open Source Intelligence) tool called "Ominis OSINT - Web Hunter." It performs online information gathering by querying Google for search results related to a user...
AgentQL is an AI-powered query language for web scraping and automation. It uses natural language selectors to find data on any page, including authenticated content. AgentQL queries are self-healing ...
A custom Firefox Selenium-based Webdriver. Passes all bot mitigation systems
Modern tests to detect automated browser behavior. Cover most important leaks from Puppeteer and Playwright.
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on dema...
⬇️ A simple all-in-one CLI tool to download EVERYTHING from a URL (like youtube-dl/yt-dlp, forum-dl, gallery-dl, simpler ArchiveBox). 🎭 Uses headless Chrome to get HTML, JS, CSS, images/video/audio/s...
BotBrowser modifies Chromium's native C++ core to emulate a real browser, bypassing advanced antibot systems like Shape, Cloudflare, PerimeterX, Akamai, Kasada, and various reCAPTCHA-like defenses.
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
List of anti-detect and humanizing tools and browsers, including captcha solvers and sms-activation.
OSINT cheat sheet, list OSINT tools, wiki, dataset, article, book , red team OSINT and OSINT tips
🕵️♂️ Collect a dossier on a person by username from thousands of sites
This script allows you to automate the creation of Gmail accounts using the Selenium automation framework with the Chrome WebDriver. It navigates through the Gmail sign-up process by filling in the re...
📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.
自学入门 Python 优质中文资源索引,包含 书籍 / 文档 / 视频,适用于 爬虫 / Web / 数据分析 / 机器学习 方向
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
🕵️♂️ Collect a dossier on a person by username from thousands of sites
Auto_Jobs_Applier_AI_Agent aims to easy job hunt process by automating the job application process. Utilizing artificial intelligence, it enables users to apply for multiple jobs in an automated and p...
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...
Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
Scrapy, a fast high-level web crawling & scraping framework for Python.
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...
Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
Twitter API Scraper | Without an API key | Twitter Internal API | Free | Twitter scraper | Twitter Bot
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on dema...
AgentQL is an AI-powered query language for web scraping and automation. It uses natural language selectors to find data on any page, including authenticated content. AgentQL queries are self-healing ...
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
Browser fingerprinting tools for anonymizing your scrapers. Developed by Apify.
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
This Python application is an OSINT (Open Source Intelligence) tool called "Ominis OSINT - Web Hunter." It performs online information gathering by querying Google for search results related to a user...
A script that can be used to capture various porn novels for machine learning / 一个可以用于抓取各类色情小说用于机器学习的脚本
Modern tests to detect automated browser behavior. Cover most important leaks from Puppeteer and Playwright.
Turn Webpage to LLM friendly input text. Similar to Jina Reader and Firecrawl API. Makes image & webpage links extraction easy for web scraping.
BotBrowser modifies Chromium's native C++ core to emulate a real browser, bypassing advanced antibot systems like Shape, Cloudflare, PerimeterX, Akamai, Kasada, and various reCAPTCHA-like defenses.
A simple Node.js code to get public information and media from every Instagram post or reel URL without API. Working 2024
AgentQL is an AI-powered query language for web scraping and automation. It uses natural language selectors to find data on any page, including authenticated content. AgentQL queries are self-healing ...
⬇️ A simple all-in-one CLI tool to download EVERYTHING from a URL (like youtube-dl/yt-dlp, forum-dl, gallery-dl, simpler ArchiveBox). 🎭 Uses headless Chrome to get HTML, JS, CSS, images/video/audio/s...
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on dema...
The Kemono and Coomer Downloader simplifies downloading posts from Kemono and Coomer websites, allowing users to download individual or multiple posts, including entire profiles. It offers advanced fe...
This Python application is an OSINT (Open Source Intelligence) tool called "Ominis OSINT - Web Hunter." It performs online information gathering by querying Google for search results related to a user...
A custom Firefox Selenium-based Webdriver. Passes all bot mitigation systems
Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
🕵️♂️ Collect a dossier on a person by username from thousands of sites
A Content Discovery and Development Platform. Empowering Cybersecurity, AI, Marketing, and Finance professionals and researchers to discover, analyze, and interact with the web in all its dimensions.
A drop-in replacement for playwright-python patched with rebrowser-patches. It allows to pass modern automation detection tests.
🕵️♂️ Collect a dossier on a person by username from thousands of sites
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Auto_Jobs_Applier_AI_Agent aims to easy job hunt process by automating the job application process. Utilizing artificial intelligence, it enables users to apply for multiple jobs in an automated and p...
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...
Scrapy, a fast high-level web crawling & scraping framework for Python.
Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...
Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
Twitter API Scraper | Without an API key | Twitter Internal API | Free | Twitter scraper | Twitter Bot
AgentQL is an AI-powered query language for web scraping and automation. It uses natural language selectors to find data on any page, including authenticated content. AgentQL queries are self-healing ...
Browser fingerprinting tools for anonymizing your scrapers. Developed by Apify.
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
List of libraries, tools and APIs for web scraping and data processing.
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on dema...
Swiss-army tool for scraping and extracting data from online assets, made for hackers
⬇️ A simple all-in-one CLI tool to download EVERYTHING from a URL (like youtube-dl/yt-dlp, forum-dl, gallery-dl, simpler ArchiveBox). 🎭 Uses headless Chrome to get HTML, JS, CSS, images/video/audio/s...
Turn Webpage to LLM friendly input text. Similar to Jina Reader and Firecrawl API. Makes image & webpage links extraction easy for web scraping.
A script that can be used to capture various porn novels for machine learning / 一个可以用于抓取各类色情小说用于机器学习的脚本
BotBrowser modifies Chromium's native C++ core to emulate a real browser, bypassing advanced antibot systems like Shape, Cloudflare, PerimeterX, Akamai, Kasada, and various reCAPTCHA-like defenses.
Modern tests to detect automated browser behavior. Cover most important leaks from Puppeteer and Playwright.
AgentQL is an AI-powered query language for web scraping and automation. It uses natural language selectors to find data on any page, including authenticated content. AgentQL queries are self-healing ...
A custom Firefox Selenium-based Webdriver. Passes all bot mitigation systems
ScraperAI is an open-source, AI-powered tool designed to simplify web scraping for users of all skill levels.
The Kemono and Coomer Downloader simplifies downloading posts from Kemono and Coomer websites, allowing users to download individual or multiple posts, including entire profiles. It offers advanced fe...
🕵️♂️ Collect a dossier on a person by username from thousands of sites
Enhanced, ads-free and fast responsive interface to browse guitar tabs scraped from Ultimate Guitar.
A drop-in replacement for playwright-python patched with rebrowser-patches. It allows to pass modern automation detection tests.
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on dema...
Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
A simple Node.js code to get public information and media from every Instagram post or reel URL without API. Working 2024
Auto_Jobs_Applier_AI_Agent aims to easy job hunt process by automating the job application process. Utilizing artificial intelligence, it enables users to apply for multiple jobs in an automated and p...
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...
Swiss-army tool for scraping and extracting data from online assets, made for hackers
Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
Twitter API Scraper | Without an API key | Twitter Internal API | Free | Twitter scraper | Twitter Bot
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on dema...
AgentQL is an AI-powered query language for web scraping and automation. It uses natural language selectors to find data on any page, including authenticated content. AgentQL queries are self-healing ...
List of anti-detect and humanizing tools and browsers, including captcha solvers and sms-activation.
Python library for automated email account creation. Create multiple accounts easily with support for major email providers.
Tools to build web AI agents that can authenticate, interact with and extract data from any website.
Auto_Jobs_Applier_AI_Agent aims to easy job hunt process by automating the job application process. Utilizing artificial intelligence, it enables users to apply for multiple jobs in an automated and p...
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...
🕵️♂️ Collect a dossier on a person by username from thousands of sites
Scrapy, a fast high-level web crawling & scraping framework for Python.
Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
Swiss-army tool for scraping and extracting data from online assets, made for hackers
Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
Twitter API Scraper | Without an API key | Twitter Internal API | Free | Twitter scraper | Twitter Bot
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
HTTP(S)/SOCKS5 rotating residential proxies - code examples & general information.
List of libraries, tools and APIs for web scraping and data processing.
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on dema...
This Python application is an OSINT (Open Source Intelligence) tool called "Ominis OSINT - Web Hunter." It performs online information gathering by querying Google for search results related to a user...
Python quick start guides to get the most out of Oxylabs' Web Scraper API free trial.
📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.
Sasori is a dynamic web crawler powered by Puppeteer, designed for lightning-fast endpoint discovery.
Website scraper for getting invoices automagically as pdf (useful for taxes or DMS)
Hear local historical markers as you travel on your road-trip. 100% Shared Compose UI, Kotlin native cross-platform codebase. Includes Cocoapods, Google Maps, GPS Location, notifications, background l...
⬇️ A simple all-in-one CLI tool to download EVERYTHING from a URL (like youtube-dl/yt-dlp, forum-dl, gallery-dl, simpler ArchiveBox). 🎭 Uses headless Chrome to get HTML, JS, CSS, images/video/audio/s...
Site que agrega filmes em cartaz em algumas das diversas salas de cinema de Porto Alegre.
In this Python Web Scraping Tutorial, we will outline everything needed to get started with web scraping. We will begin with simple examples and move on to relatively more complex.
A simple Node.js code to get public information and media from every Instagram post or reel URL without API. Working 2024
A better EcoleDirecte (unaffiliated): more pleasant, functional, and improved experience.
The Professional Telegram Bulk Members Adder & Scraper. More on doublegram.com
Enhanced, ads-free and fast responsive interface to browse guitar tabs scraped from Ultimate Guitar.