Trending repositories for topic scraping
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Auto_Jobs_Applier_AI_Agent by AIHawk is an AI Agent that automates the jobs application process. Utilizing artificial intelligence, it enables users to apply for multiple jobs in an automated and per...
Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...
Scrapy, a fast high-level web crawling & scraping framework for Python.
Bypasses pay-walls and scrapes all the paid content on a creator's page.
AgentQL is an AI-powered query language for web scraping and automation. It uses natural language selectors to find data on any page, including authenticated content. AgentQL queries are self-healing ...
Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...
Twitter API Scraper | Without an API key | Twitter Internal API | Free | Twitter scraper | Twitter Bot
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
Swiss-army tool for scraping and extracting data from online assets, made for hackers
List of libraries, tools and APIs for web scraping and data processing.
Tools to build web AI agents that can authenticate, interact with and extract data from any website.
Bypasses pay-walls and scrapes all the paid content on a creator's page.
AgentQL is an AI-powered query language for web scraping and automation. It uses natural language selectors to find data on any page, including authenticated content. AgentQL queries are self-healing ...
Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
Tools to build web AI agents that can authenticate, interact with and extract data from any website.
A simple Node.js code to get public information and media from every Instagram post or reel URL without API. Working 2024
🧰 A collection of automation tools for Instagram 📱| Written in Python 🐍 | Don't forget to ⭐ the repo !
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Twitter API Scraper | Without an API key | Twitter Internal API | Free | Twitter scraper | Twitter Bot
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on dema...
Web data extraction tool implemented as chrome extension
This Python application is an OSINT (Open Source Intelligence) tool called "Ominis OSINT - Web Hunter." It performs online information gathering by querying Google for search results related to a user...
Swiss-army tool for scraping and extracting data from online assets, made for hackers
Bypasses pay-walls and scrapes all the paid content on a creator's page.
Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Auto_Jobs_Applier_AI_Agent by AIHawk is an AI Agent that automates the jobs application process. Utilizing artificial intelligence, it enables users to apply for multiple jobs in an automated and per...
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...
Scrapy, a fast high-level web crawling & scraping framework for Python.
Bypasses pay-walls and scrapes all the paid content on a creator's page.
Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
AgentQL is an AI-powered query language for web scraping and automation. It uses natural language selectors to find data on any page, including authenticated content. AgentQL queries are self-healing ...
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...
Twitter API Scraper | Without an API key | Twitter Internal API | Free | Twitter scraper | Twitter Bot
Swiss-army tool for scraping and extracting data from online assets, made for hackers
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
Browser fingerprinting tools for anonymizing your scrapers. Developed by Apify.
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on dema...
Bypasses pay-walls and scrapes all the paid content on a creator's page.
Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
AgentQL is an AI-powered query language for web scraping and automation. It uses natural language selectors to find data on any page, including authenticated content. AgentQL queries are self-healing ...
ScraperAI is an open-source, AI-powered tool designed to simplify web scraping for users of all skill levels.
Automate your LinkedIn job applications with AI! This bot utilizes GPT models such as GPT-4, GPT-3.5, and Google's Gemini Pro for Easy Apply form filling, customizable to your preferences and job sear...
A simple Node.js code to get public information and media from every Instagram post or reel URL without API. Working 2024
Instagram Scraper. Scrape Instagram followers, following list, and post authors. Download CSV files with Instagram users from followers, following, tag and location pages.
Tools to build web AI agents that can authenticate, interact with and extract data from any website.
A better EcoleDirecte (unaffiliated): more pleasant, functional, and improved experience.
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on dema...
Python library for automated email account creation. Create multiple accounts easily with support for major email providers.
This script allows you to automate the creation of Gmail accounts using the Selenium automation framework with the Chrome WebDriver. It navigates through the Gmail sign-up process by filling in the re...
Twitter API Scraper | Without an API key | Twitter Internal API | Free | Twitter scraper | Twitter Bot
A Content Discovery and Development Platform. Empowering Cybersecurity, AI, Marketing, and Finance professionals and researchers to discover, analyze, and interact with the web in all its dimensions.
Bypasses pay-walls and scrapes all the paid content on a creator's page.
Auto_Jobs_Applier_AI_Agent by AIHawk is an AI Agent that automates the jobs application process. Utilizing artificial intelligence, it enables users to apply for multiple jobs in an automated and per...
Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Swiss-army tool for scraping and extracting data from online assets, made for hackers
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...
Scrapy, a fast high-level web crawling & scraping framework for Python.
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
AgentQL is an AI-powered query language for web scraping and automation. It uses natural language selectors to find data on any page, including authenticated content. AgentQL queries are self-healing ...
Twitter API Scraper | Without an API key | Twitter Internal API | Free | Twitter scraper | Twitter Bot
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on dema...
Bypasses pay-walls and scrapes all the paid content on a creator's page.
List of libraries, tools and APIs for web scraping and data processing.
Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
Bypasses pay-walls and scrapes all the paid content on a creator's page.
AgentQL is an AI-powered query language for web scraping and automation. It uses natural language selectors to find data on any page, including authenticated content. AgentQL queries are self-healing ...
Tools to build web AI agents that can authenticate, interact with and extract data from any website.
A drop-in replacement for playwright-python patched with rebrowser-patches. It allows to pass modern automation detection tests.
Swiss-army tool for scraping and extracting data from online assets, made for hackers
Automate your LinkedIn job applications with AI! This bot utilizes GPT models such as GPT-4, GPT-3.5, and Google's Gemini Pro for Easy Apply form filling, customizable to your preferences and job sear...
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on dema...
Instagram Scraper. Scrape Instagram followers, following list, and post authors. Download CSV files with Instagram users from followers, following, tag and location pages.
Modern tests to detect automated browser behavior. Cover most important leaks from Puppeteer and Playwright.
Auto_Jobs_Applier_AI_Agent by AIHawk is an AI Agent that automates the jobs application process. Utilizing artificial intelligence, it enables users to apply for multiple jobs in an automated and per...
A simple Node.js code to get public information and media from every Instagram post or reel URL without API. Working 2024
ScraperAI is an open-source, AI-powered tool designed to simplify web scraping for users of all skill levels.
The Kemono and Coomer Downloader simplifies downloading posts from Kemono and Coomer websites, allowing users to download individual or multiple posts, including entire profiles. It offers advanced fe...
📜 Framework-agnostic API scraper to load items from any paginated JSON API into a Laravel lazy collection via async HTTP requests.
A better EcoleDirecte (unaffiliated): more pleasant, functional, and improved experience.
Auto_Jobs_Applier_AI_Agent by AIHawk is an AI Agent that automates the jobs application process. Utilizing artificial intelligence, it enables users to apply for multiple jobs in an automated and per...
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...
Swiss-army tool for scraping and extracting data from online assets, made for hackers
Twitter API Scraper | Without an API key | Twitter Internal API | Free | Twitter scraper | Twitter Bot
Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on dema...
List of anti-detect and humanizing tools and browsers, including captcha solvers and sms-activation.
AgentQL is an AI-powered query language for web scraping and automation. It uses natural language selectors to find data on any page, including authenticated content. AgentQL queries are self-healing ...
Python library for automated email account creation. Create multiple accounts easily with support for major email providers.
Tools to build web AI agents that can authenticate, interact with and extract data from any website.
Bypasses pay-walls and scrapes all the paid content on a creator's page.
Auto_Jobs_Applier_AI_Agent by AIHawk is an AI Agent that automates the jobs application process. Utilizing artificial intelligence, it enables users to apply for multiple jobs in an automated and per...
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...
Scrapy, a fast high-level web crawling & scraping framework for Python.
Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
Swiss-army tool for scraping and extracting data from online assets, made for hackers
Twitter API Scraper | Without an API key | Twitter Internal API | Free | Twitter scraper | Twitter Bot
Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
HTTP(S)/SOCKS5 rotating residential proxies - code examples & general information.
List of libraries, tools and APIs for web scraping and data processing.
Python quick start guides to get the most out of Oxylabs' Web Scraper API free trial.
A command-line utility for taking automated screenshots of websites
📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on dema...
This Python application is an OSINT (Open Source Intelligence) tool called "Ominis OSINT - Web Hunter." It performs online information gathering by querying Google for search results related to a user...
Python quick start guides to get the most out of Oxylabs' Web Scraper API free trial.
Sasori is a dynamic web crawler powered by Puppeteer, designed for lightning-fast endpoint discovery.
Website scraper for getting invoices automagically as pdf (useful for taxes or DMS)
In this Python Web Scraping Tutorial, we will outline everything needed to get started with web scraping. We will begin with simple examples and move on to relatively more complex.
Site que agrega filmes em cartaz em algumas das diversas salas de cinema de Porto Alegre.
A simple Node.js code to get public information and media from every Instagram post or reel URL without API. Working 2024
A better EcoleDirecte (unaffiliated): more pleasant, functional, and improved experience.
Bypasses pay-walls and scrapes all the paid content on a creator's page.
Enhanced, ads-free and fast responsive interface to browse guitar tabs scraped from Ultimate Guitar.
Professional scrapers that provide full control to the users. Crawlee One builds on top of Crawlee and Apify and extends them with features for robust and highly configurable web scrapers.
HTTP(S)/SOCKS5 rotating residential proxies - code examples & general information.