Trending repositories for topic scraping

Last 3 days (new repositories)

no newly created repositories trending in the last 3 days

Last 3 days (absolute gain)

mendableai/firecrawl

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

20,357 (+256)

agpl-3.0

soxoj/maigret

🕵️‍♂️ Collect a dossier on a person by username from thousands of sites

13,686 (+122)

mit

AIHawk-FOSS/Auto_Jobs_Applier_AI_Agent

Auto_Jobs_Applier_AI_Agent aims to easy job hunt process by automating the job application process. Utilizing artificial intelligence, it enables users to apply for multiple jobs in an automated and p...

23,744 (+51)

agpl-3.0

gregpr07/browser-use

Make websites accessible for AI agents

3,201 (+50)

mit

apify/crawlee

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...

16,247 (+45)

apache-2.0

ScrapeGraphAI/Scrapegraph-ai

Python scraper based on AI

16,416 (+40)

mit

apify/crawlee-python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...

4,877 (+28)

apache-2.0

scrapy/scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

53,528 (+25)

bsd-3-clause

D4Vinci/Scrapling

Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python

1,730 (+21)

bsd-3-clause

gocolly/colly

Elegant Scraper and Crawler Framework for Golang

23,469 (+16)

apache-2.0

ultrafunkamsterdam/undetected-chromedriver

Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)

10,262 (+15)

gpl-3.0

AnonCatalyst/Ominis-OSINT

This Python application is an OSINT (Open Source Intelligence) tool called "Ominis OSINT - Web Hunter." It performs online information gathering by querying Google for search results related to a user...

336 (+13)

mit

tinyfish-io/agentql

AgentQL is an AI-powered query language for web scraping and automation. It uses natural language selectors to find data on any page, including authenticated content. AgentQL queries are self-healing ...

357 (+13)

d60/twikit

1,666 (+11)

mit

rebrowser/rebrowser-patches

Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on dema...

417 (+10)

Jieyab89/OSINT-Cheat-sheet

OSINT cheat sheet, list OSINT tools, wiki, dataset, article, book , red team OSINT and OSINT tips

781 (+8)

daijro/camoufox

🦊 Anti-detect browser

834 (+8)

mpl-2.0

tabulapdf/tabula

Tabula is a tool for liberating data tables trapped inside PDF files

6,844 (+7)

mit

simonw/shot-scraper

A command-line utility for taking automated screenshots of websites

1,748 (+6)

apache-2.0

adbar/trafilatura

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

3,747 (+5)

apache-2.0

Last 3 days (relative gain)

m92vyas/llm-reader

Turn Webpage to LLM friendly input text. Similar to Jina Reader and Firecrawl API. Makes image & webpage links extraction easy for web scraping.

26 (+8%)

mit

AnonCatalyst/Ominis-OSINT

336 (+4%)

mit

tinyfish-io/agentql

357 (+4%)

bytexenon/undetected_geckodriver

A custom Firefox Selenium-based Webdriver. Passes all bot mitigation systems

29 (+4%)

mit

rebrowser/rebrowser-bot-detector

Modern tests to detect automated browser behavior. Cover most important leaks from Puppeteer and Playwright.

41 (+3%)

rebrowser/rebrowser-patches

417 (+2%)

ArchiveBox/abx-dl

⬇️ A simple all-in-one CLI tool to download EVERYTHING from a URL (like youtube-dl/yt-dlp, forum-dl, gallery-dl, simpler ArchiveBox). 🎭 Uses headless Chrome to get HTML, JS, CSS, images/video/audio/s...

49 (+2%)

mit

drudge/n8n-nodes-puppeteer

n8n node for browser automation using Puppeteer

103 (+2%)

mit

gregpr07/browser-use

Make websites accessible for AI agents

3,201 (+2%)

mit

MiddleSchoolStudent/BotBrowser

BotBrowser modifies Chromium's native C++ core to emulate a real browser, bypassing advanced antibot systems like Shape, Cloudflare, PerimeterX, Akamai, Kasada, and various reCAPTCHA-like defenses.

69 (+1%)

mit

mendableai/firecrawl

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

20,357 (+1%)

agpl-3.0

D4Vinci/Scrapling

Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python

1,730 (+1%)

bsd-3-clause

TheGP/untidetect-tools

List of anti-detect and humanizing tools and browsers, including captcha solvers and sms-activation.

352 (+1%)

Jieyab89/OSINT-Cheat-sheet

OSINT cheat sheet, list OSINT tools, wiki, dataset, article, book , red team OSINT and OSINT tips

781 (+1%)

daijro/browserforge

🎭 Intelligent browser header & fingerprint generator

299 (+1%)

apache-2.0

daijro/camoufox

🦊 Anti-detect browser

834 (+1.0%)

mpl-2.0

soxoj/maigret

🕵️‍♂️ Collect a dossier on a person by username from thousands of sites

13,686 (+0.9%)

mit

khaouitiabdelhakim/Gmail-Creation-Automation-Python

This script allows you to automate the creation of Gmail accounts using the Selenium automation framework with the Chrome WebDriver. It navigates through the Gmail sign-up process by filling in the re...

129 (+0.8%)

AndyTheFactory/newspaper4k

📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.

529 (+0.8%)

mit

zkqiang/awesome-python-primer

自学入门 Python 优质中文资源索引，包含书籍 / 文档 / 视频，适用于爬虫 / Web / 数据分析 / 机器学习方向

147 (+0.7%)

mit

Last week (new repositories)

no newly created repositories trending in the last week

Last week (absolute gain)

mendableai/firecrawl

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

20,357 (+683)

agpl-3.0

soxoj/maigret

🕵️‍♂️ Collect a dossier on a person by username from thousands of sites

13,686 (+428)

mit

gregpr07/browser-use

Make websites accessible for AI agents

3,201 (+134)

mit

AIHawk-FOSS/Auto_Jobs_Applier_AI_Agent

23,744 (+130)

agpl-3.0

apify/crawlee

16,247 (+120)

apache-2.0

ScrapeGraphAI/Scrapegraph-ai

Python scraper based on AI

16,416 (+97)

mit

D4Vinci/Scrapling

Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python

1,730 (+59)

bsd-3-clause

scrapy/scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

53,528 (+52)

bsd-3-clause

apify/crawlee-python

4,877 (+50)

apache-2.0

ultrafunkamsterdam/undetected-chromedriver

Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)

10,262 (+49)

gpl-3.0

d60/twikit

1,666 (+37)

mit

daijro/camoufox

🦊 Anti-detect browser

834 (+26)

mpl-2.0

gocolly/colly

Elegant Scraper and Crawler Framework for Golang

23,469 (+26)

apache-2.0

rebrowser/rebrowser-patches

417 (+25)

tinyfish-io/agentql

357 (+23)

adbar/trafilatura

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

3,747 (+18)

apache-2.0

apify/fingerprint-suite

Browser fingerprinting tools for anonymizing your scrapers. Developed by Apify.

1,090 (+17)

apache-2.0

daijro/browserforge

🎭 Intelligent browser header & fingerprint generator

299 (+15)

apache-2.0

alirezamika/autoscraper

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

6,540 (+14)

mit

AnonCatalyst/Ominis-OSINT

336 (+13)

mit

Last week (relative gain)

ystemsrx/Porn-Novel-Scraper

A script that can be used to capture various porn novels for machine learning / 一个可以用于抓取各类色情小说用于机器学习的脚本

33 (+18%)

mit

rebrowser/rebrowser-bot-detector

Modern tests to detect automated browser behavior. Cover most important leaks from Puppeteer and Playwright.

41 (+14%)

m92vyas/llm-reader

Turn Webpage to LLM friendly input text. Similar to Jina Reader and Firecrawl API. Makes image & webpage links extraction easy for web scraping.

26 (+8%)

mit

MiddleSchoolStudent/BotBrowser

BotBrowser modifies Chromium's native C++ core to emulate a real browser, bypassing advanced antibot systems like Shape, Cloudflare, PerimeterX, Akamai, Kasada, and various reCAPTCHA-like defenses.

69 (+8%)

mit

ahmedrangel/instagram-media-scraper

A simple Node.js code to get public information and media from every Instagram post or reel URL without API. Working 2024

44 (+7%)

mit

tinyfish-io/agentql

357 (+7%)

ArchiveBox/abx-dl

49 (+7%)

mit

rebrowser/rebrowser-patches

417 (+6%)

e43b/Kemono-and-Coomer-Downloader

The Kemono and Coomer Downloader simplifies downloading posts from Kemono and Coomer websites, allowing users to download individual or multiple posts, including entire profiles. It offers advanced fe...

69 (+6%)

daijro/browserforge

🎭 Intelligent browser header & fingerprint generator

299 (+5%)

apache-2.0

gregpr07/browser-use

Make websites accessible for AI agents

3,201 (+4%)

mit

AnonCatalyst/Ominis-OSINT

336 (+4%)

mit

bytexenon/undetected_geckodriver

A custom Firefox Selenium-based Webdriver. Passes all bot mitigation systems

29 (+4%)

mit

D4Vinci/Scrapling

Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python

1,730 (+4%)

bsd-3-clause

mendableai/firecrawl

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

20,357 (+3%)

agpl-3.0

soxoj/maigret

🕵️‍♂️ Collect a dossier on a person by username from thousands of sites

13,686 (+3%)

mit

daijro/camoufox

🦊 Anti-detect browser

834 (+3%)

mpl-2.0

drudge/n8n-nodes-puppeteer

n8n node for browser automation using Puppeteer

103 (+3%)

mit

pzaino/thecrowler

A Content Discovery and Development Platform. Empowering Cybersecurity, AI, Marketing, and Finance professionals and researchers to discover, analyze, and interact with the web in all its dimensions.

41 (+3%)

apache-2.0

rebrowser/rebrowser-playwright-python

A drop-in replacement for playwright-python patched with rebrowser-patches. It allows to pass modern automation detection tests.

42 (+2%)

apache-2.0

Last month (new repositories)

no newly created repositories trending in the last month

Last month (absolute gain)

soxoj/maigret

🕵️‍♂️ Collect a dossier on a person by username from thousands of sites

13,686 (+3,413)

mit

mendableai/firecrawl

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

20,357 (+1,454)

agpl-3.0

AIHawk-FOSS/Auto_Jobs_Applier_AI_Agent

23,744 (+1,337)

agpl-3.0

gregpr07/browser-use

Make websites accessible for AI agents

3,201 (+937)

mit

ScrapeGraphAI/Scrapegraph-ai

Python scraper based on AI

16,416 (+562)

mit

apify/crawlee

16,247 (+527)

apache-2.0

scrapy/scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

53,528 (+303)

bsd-3-clause

D4Vinci/Scrapling

Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python

1,730 (+258)

bsd-3-clause

apify/crawlee-python

4,877 (+242)

apache-2.0

ultrafunkamsterdam/undetected-chromedriver

Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)

10,262 (+208)

gpl-3.0

d60/twikit

1,666 (+189)

mit

daijro/camoufox

🦊 Anti-detect browser

834 (+146)

mpl-2.0

gocolly/colly

Elegant Scraper and Crawler Framework for Golang

23,469 (+124)

apache-2.0

tinyfish-io/agentql

357 (+110)

Benexl/FastAnime

Your browser anime experience from the terminal

386 (+103)

unlicense

apify/fingerprint-suite

Browser fingerprinting tools for anonymizing your scrapers. Developed by Apify.

1,090 (+98)

apache-2.0

adbar/trafilatura

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

3,747 (+90)

apache-2.0

lorien/awesome-web-scraping

List of libraries, tools and APIs for web scraping and data processing.

6,780 (+75)

rebrowser/rebrowser-patches

417 (+73)

bjesus/pipet

Swiss-army tool for scraping and extracting data from online assets, made for hackers

1,926 (+65)

mit

Last month (relative gain)

ArchiveBox/abx-dl

49 (+104%)

mit

m92vyas/llm-reader

Turn Webpage to LLM friendly input text. Similar to Jina Reader and Firecrawl API. Makes image & webpage links extraction easy for web scraping.

26 (+100%)

mit

ystemsrx/Porn-Novel-Scraper

A script that can be used to capture various porn novels for machine learning / 一个可以用于抓取各类色情小说用于机器学习的脚本

33 (+65%)

mit

MiddleSchoolStudent/BotBrowser

BotBrowser modifies Chromium's native C++ core to emulate a real browser, bypassing advanced antibot systems like Shape, Cloudflare, PerimeterX, Akamai, Kasada, and various reCAPTCHA-like defenses.

69 (+60%)

mit

rebrowser/rebrowser-bot-detector

Modern tests to detect automated browser behavior. Cover most important leaks from Puppeteer and Playwright.

41 (+52%)

tinyfish-io/agentql

357 (+45%)

gregpr07/browser-use

Make websites accessible for AI agents

3,201 (+41%)

mit

bytexenon/undetected_geckodriver

A custom Firefox Selenium-based Webdriver. Passes all bot mitigation systems

29 (+38%)

mit

Benexl/FastAnime

Your browser anime experience from the terminal

386 (+36%)

unlicense

scraperai/scraperai

ScraperAI is an open-source, AI-powered tool designed to simplify web scraping for users of all skill levels.

53 (+36%)

gpl-3.0

e43b/Kemono-and-Coomer-Downloader

69 (+35%)

soxoj/maigret

🕵️‍♂️ Collect a dossier on a person by username from thousands of sites

13,686 (+33%)

mit

BenoitBellegarde/UltimateTab

Enhanced, ads-free and fast responsive interface to browse guitar tabs scraped from Ultimate Guitar.

42 (+24%)

mit

rebrowser/rebrowser-playwright-python

A drop-in replacement for playwright-python patched with rebrowser-patches. It allows to pass modern automation detection tests.

42 (+24%)

apache-2.0

daijro/camoufox

🦊 Anti-detect browser

834 (+21%)

mpl-2.0

rebrowser/rebrowser-patches

417 (+21%)

drudge/n8n-nodes-puppeteer

n8n node for browser automation using Puppeteer

103 (+20%)

mit

D4Vinci/Scrapling

Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python

1,730 (+18%)

bsd-3-clause

daijro/browserforge

🎭 Intelligent browser header & fingerprint generator

299 (+17%)

apache-2.0

ahmedrangel/instagram-media-scraper

A simple Node.js code to get public information and media from every Instagram post or reel URL without API. Working 2024

44 (+16%)

mit

Last 12-months (new repositories)

AIHawk-FOSS/Auto_Jobs_Applier_AI_Agent

23,744

agpl-3.0

mendableai/firecrawl

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

20,357

agpl-3.0

ScrapeGraphAI/Scrapegraph-ai

Python scraper based on AI

16,416

mit

apify/crawlee-python

4,877

apache-2.0

gregpr07/browser-use

Make websites accessible for AI agents

3,201

mit

bjesus/pipet

Swiss-army tool for scraping and extracting data from online assets, made for hackers

1,926

mit

D4Vinci/Scrapling

Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python

1,730

bsd-3-clause

d60/twikit

1,666

mit

raznem/parsera

Lightweight library for scraping web-sites with LLMs

930

gpl-2.0

daijro/camoufox

🦊 Anti-detect browser

834

mpl-2.0

rebrowser/rebrowser-patches

417

Benexl/FastAnime

Your browser anime experience from the terminal

386

unlicense

tinyfish-io/agentql

357

TheGP/untidetect-tools

List of anti-detect and humanizing tools and browsers, including captcha solvers and sms-activation.

352

daijro/browserforge

🎭 Intelligent browser header & fingerprint generator

299

apache-2.0

Aran404/SpotAPI

A python wrapper for the public & private Spotify API

209

gpl-3.0

david96182/ninjemail

Python library for automated email account creation. Create multiple accounts easily with support for major email providers.

113

mit

maxmindlin/scout-lang

A web crawling programming language

112

apache-2.0

dendrite-systems/dendrite-python-sdk

Tools to build web AI agents that can authenticate, interact with and extract data from any website.

105

mit

alexfazio/devdocs-to-llm

Turn any developer documentation into a GPT

mit

Last 12-months (absolute gain)

AIHawk-FOSS/Auto_Jobs_Applier_AI_Agent

23,744 (+23,742)

agpl-3.0

mendableai/firecrawl

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

20,357 (+20,356)

agpl-3.0

ScrapeGraphAI/Scrapegraph-ai

Python scraper based on AI

16,416 (+16,403)

mit

apify/crawlee

16,247 (+5,276)

apache-2.0

apify/crawlee-python

4,877 (+4,876)

apache-2.0

soxoj/maigret

🕵️‍♂️ Collect a dossier on a person by username from thousands of sites

13,686 (+4,412)

mit

scrapy/scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

53,528 (+3,908)

bsd-3-clause

ultrafunkamsterdam/undetected-chromedriver

Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)

10,262 (+3,166)

gpl-3.0

gregpr07/browser-use

Make websites accessible for AI agents

3,201 (+3,025)

mit

gocolly/colly

Elegant Scraper and Crawler Framework for Golang

23,469 (+2,101)

apache-2.0

bjesus/pipet

Swiss-army tool for scraping and extracting data from online assets, made for hackers

1,926 (+1,925)

mit

D4Vinci/Scrapling

Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python

1,730 (+1,728)

bsd-3-clause

d60/twikit

1,666 (+1,664)

mit

adbar/trafilatura

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

3,747 (+1,415)

apache-2.0

raznem/parsera

Lightweight library for scraping web-sites with LLMs

930 (+892)

gpl-2.0

daijro/camoufox

🦊 Anti-detect browser

834 (+832)

mpl-2.0

alirezamika/autoscraper

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

6,540 (+831)

mit

Smartproxy/Smartproxy

HTTP(S)/SOCKS5 rotating residential proxies - code examples & general information.

1,111 (+829)

mit

lorien/awesome-web-scraping

List of libraries, tools and APIs for web scraping and data processing.

6,780 (+695)

snooppr/snoop

Snoop — инструмент разведки на основе открытых данных (OSINT world)

3,036 (+621)

Last 12-months (relative gain)

ScrapeGraphAI/Scrapegraph-ai

Python scraper based on AI

16,416 (+126,177%)

mit

rebrowser/rebrowser-patches

417 (+5,857%)

AnonCatalyst/Ominis-OSINT

336 (+2,700%)

mit

raznem/parsera

Lightweight library for scraping web-sites with LLMs

930 (+2,347%)

gpl-2.0

daijro/browserforge

🎭 Intelligent browser header & fingerprint generator

299 (+2,200%)

apache-2.0

oxylabs/quick-start-guide

Python quick start guides to get the most out of Oxylabs' Web Scraper API free trial.

532 (+1,870%)

AndyTheFactory/newspaper4k

📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.

529 (+1,789%)

mit

karthikuj/sasori

Sasori is a dynamic web crawler powered by Puppeteer, designed for lightning-fast endpoint discovery.

132 (+1,786%)

mit

gregpr07/browser-use

Make websites accessible for AI agents

3,201 (+1,719%)

mit

Disane87/docudigger

Website scraper for getting invoices automagically as pdf (useful for taxes or DMS)

59 (+1,375%)

mit

realityexpander/FredsRoadtripStoryteller

Hear local historical markers as you travel on your road-trip. 100% Shared Compose UI, Kotlin native cross-platform codebase. Includes Cocoapods, Google Maps, GPS Location, notifications, background l...

126 (+1,300%)

ArchiveBox/abx-dl

49 (+880%)

mit

plabayo/rama

modular service framework to move and transform network packets

202 (+818%)

apache-2.0

cumbucadev/cinemaempoa

Site que agrega filmes em cartaz em algumas das diversas salas de cinema de Porto Alegre.

30 (+650%)

gpl-3.0

oxylabs/Python-Web-Scraping-Tutorial

In this Python Web Scraping Tutorial, we will outline everything needed to get started with web scraping. We will begin with simple examples and move on to relatively more complex.

275 (+643%)

ahmedrangel/instagram-media-scraper

A simple Node.js code to get public information and media from every Instagram post or reel URL without API. Working 2024

44 (+633%)

mit