Trending repositories for topic crawling

Last 3 days (new repositories)

no newly created repositories trending in the last 3 days

Last 3 days (absolute gain)

apify/crawlee

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...

16,247 (+45)

apache-2.0

apify/crawlee-python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...

4,877 (+28)

apache-2.0

scrapy/scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

53,528 (+25)

bsd-3-clause

janreges/siteone-crawler

SiteOne Crawler is a cross-platform website crawler and analyzer for SEO, security, accessibility, and performance optimization—ideal for developers, DevOps, QA engineers, and consultants. Supports Wi...

352 (+22)

mit

D4Vinci/Scrapling

Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python

1,730 (+21)

bsd-3-clause

go-rod/rod

A Chrome DevTools Protocol driver for web automation and scraping.

5,537 (+17)

mit

gocolly/colly

Elegant Scraper and Crawler Framework for Golang

23,469 (+16)

apache-2.0

rebrowser/rebrowser-patches

Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on dema...

417 (+10)

ai-robots-txt/ai.robots.txt

A list of AI agents and robots to block.

1,330 (+9)

mit

codelucas/newspaper

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

14,232 (+5)

mit

edoardottt/cariddi

Take a list of domains, crawl urls and scan for endpoints, secrets, api keys, file extensions, tokens and more

1,557 (+4)

gpl-3.0

lorien/awesome-web-scraping

List of libraries, tools and APIs for web scraping and data processing.

6,780 (+4)

NateScarlet/holiday-cn

📅🇨🇳中国法定节假日数据自动每日抓取国务院公告

1,319 (+2)

mit

ArchiveBox/abx-dl

⬇️ A simple all-in-one CLI tool to download EVERYTHING from a URL (like youtube-dl/yt-dlp, forum-dl, gallery-dl, simpler ArchiveBox). 🎭 Uses headless Chrome to get HTML, JS, CSS, images/video/audio/s...

49 (+1)

mit

transitive-bullshit/awesome-puppeteer

A curated list of awesome puppeteer resources.

2,417 (+1)

Last 3 days (relative gain)

janreges/siteone-crawler

352 (+7%)

mit

rebrowser/rebrowser-patches

417 (+2%)

ArchiveBox/abx-dl

49 (+2%)

mit

D4Vinci/Scrapling

Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python

1,730 (+1%)

bsd-3-clause

ai-robots-txt/ai.robots.txt

A list of AI agents and robots to block.

1,330 (+0.7%)

mit

apify/crawlee-python

4,877 (+0.6%)

apache-2.0

go-rod/rod

A Chrome DevTools Protocol driver for web automation and scraping.

5,537 (+0.3%)

mit

apify/crawlee

16,247 (+0.3%)

apache-2.0

edoardottt/cariddi

Take a list of domains, crawl urls and scan for endpoints, secrets, api keys, file extensions, tokens and more

1,557 (+0.3%)

gpl-3.0

NateScarlet/holiday-cn

📅🇨🇳中国法定节假日数据自动每日抓取国务院公告

1,319 (+0.2%)

mit

gocolly/colly

Elegant Scraper and Crawler Framework for Golang

23,469 (+0.1%)

apache-2.0

lorien/awesome-web-scraping

List of libraries, tools and APIs for web scraping and data processing.

6,780 (+0.1%)

scrapy/scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

53,528 (+0.0%)

bsd-3-clause

transitive-bullshit/awesome-puppeteer

A curated list of awesome puppeteer resources.

2,417 (+0.0%)

codelucas/newspaper

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

14,232 (+0.0%)

mit

Last week (new repositories)

no newly created repositories trending in the last week

Last week (absolute gain)

apify/crawlee

16,247 (+120)

apache-2.0

janreges/siteone-crawler

352 (+88)

mit

D4Vinci/Scrapling

Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python

1,730 (+59)

bsd-3-clause

scrapy/scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

53,528 (+52)

bsd-3-clause

apify/crawlee-python

4,877 (+50)

apache-2.0

go-rod/rod

A Chrome DevTools Protocol driver for web automation and scraping.

5,537 (+31)

mit

gocolly/colly

Elegant Scraper and Crawler Framework for Golang

23,469 (+26)

apache-2.0

rebrowser/rebrowser-patches

417 (+25)

ai-robots-txt/ai.robots.txt

A list of AI agents and robots to block.

1,330 (+16)

mit

codelucas/newspaper

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

14,232 (+14)

mit

lorien/awesome-web-scraping

List of libraries, tools and APIs for web scraping and data processing.

6,780 (+9)

RevoltSecurities/SpideyX

SpideyX a multipurpose Web Penetration Testing tool with asynchronous concurrent performance with multiple mode and configurations.

128 (+4)

mit

edoardottt/cariddi

Take a list of domains, crawl urls and scan for endpoints, secrets, api keys, file extensions, tokens and more

1,557 (+4)

gpl-3.0

ArchiveBox/abx-dl

49 (+3)

mit

lorey/mlscraper

🤖 Scrape data from HTML websites automatically by just providing examples

1,333 (+3)

yujiosaka/headless-chrome-crawler

Distributed crawler powered by Headless Chrome

5,537 (+3)

mit

scrapfly/scrapfly-scrapers

Scalable Python web scraping scripts for +40 popular domains

342 (+2)

josephlimtech/linkedin-profile-scraper-api

🕵️‍♂️ LinkedIn profile scraper returning structured profile data in JSON.

575 (+2)

mit

needleworm/bhban_rpa

<6개월 치 업무를 하루 만에 끝내는 업무 자동화(생능출판사, 2020)>의 예제 코드입니다. 파이썬을 한 번도 배워본 적 없는 분들을 위한 예제이며, 엑셀부터 디자인, 매크로, 크롤링까지 업무 자동화와 관련된 다양한 분야 예제가 제공됩니다.

1,073 (+2)

NateScarlet/holiday-cn

📅🇨🇳中国法定节假日数据自动每日抓取国务院公告

1,319 (+2)

mit

Last week (relative gain)

janreges/siteone-crawler

352 (+33%)

mit

ArchiveBox/abx-dl

49 (+7%)

mit

rebrowser/rebrowser-patches

417 (+6%)

D4Vinci/Scrapling

Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python

1,730 (+4%)

bsd-3-clause

RevoltSecurities/SpideyX

SpideyX a multipurpose Web Penetration Testing tool with asynchronous concurrent performance with multiple mode and configurations.

128 (+3%)

mit

pzaino/thecrowler

A Content Discovery and Development Platform. Empowering Cybersecurity, AI, Marketing, and Finance professionals and researchers to discover, analyze, and interact with the web in all its dimensions.

41 (+3%)

apache-2.0

DBeath/feedsearch-crawler

Crawl sites for RSS, Atom, and JSON feeds.

65 (+2%)

mit

ai-robots-txt/ai.robots.txt

A list of AI agents and robots to block.

1,330 (+1%)

mit

apify/crawlee-python

4,877 (+1%)

apache-2.0

apify/crawlee

16,247 (+0.7%)

apache-2.0

scrapfly/scrapfly-scrapers

Scalable Python web scraping scripts for +40 popular domains

342 (+0.6%)

go-rod/rod

A Chrome DevTools Protocol driver for web automation and scraping.

5,537 (+0.6%)

mit

StJudeWasHere/seonaut

Open source SEO auditing tool.

270 (+0.4%)

mit

josephlimtech/linkedin-profile-scraper-api

🕵️‍♂️ LinkedIn profile scraper returning structured profile data in JSON.

575 (+0.3%)

mit

crwlrsoft/crawler

Library for Rapid (Web) Crawler and Scraper Development

344 (+0.3%)

mit

mhmdiaa/second-order

Second-order subdomain takeover scanner

380 (+0.3%)

mit

edoardottt/cariddi

Take a list of domains, crawl urls and scan for endpoints, secrets, api keys, file extensions, tokens and more

1,557 (+0.3%)

gpl-3.0

l4rm4nd/LinkedInDumper

Python 3 script to dump/scrape/extract company employees from LinkedIn API

404 (+0.2%)

lorey/mlscraper

🤖 Scrape data from HTML websites automatically by just providing examples

1,333 (+0.2%)

needleworm/bhban_rpa

1,073 (+0.2%)

Last month (new repositories)

no newly created repositories trending in the last month

Last month (absolute gain)

apify/crawlee

16,247 (+527)

apache-2.0

scrapy/scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

53,528 (+303)

bsd-3-clause

D4Vinci/Scrapling

Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python

1,730 (+258)

bsd-3-clause

apify/crawlee-python

4,877 (+242)

apache-2.0

gocolly/colly

Elegant Scraper and Crawler Framework for Golang

23,469 (+124)

apache-2.0

go-rod/rod

A Chrome DevTools Protocol driver for web automation and scraping.

5,537 (+122)

mit

janreges/siteone-crawler

352 (+98)

mit

ai-robots-txt/ai.robots.txt

A list of AI agents and robots to block.

1,330 (+89)

mit

lorien/awesome-web-scraping

List of libraries, tools and APIs for web scraping and data processing.

6,780 (+75)

rebrowser/rebrowser-patches

417 (+73)

codelucas/newspaper

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

14,232 (+47)

mit

ArchiveBox/abx-dl

49 (+25)

mit

hakluke/hakrawler

Simple, fast web crawler designed for easy, quick discovery of endpoints and assets within a web application

4,529 (+25)

gpl-3.0

webrecorder/browsertrix-crawler

Run a high-fidelity browser-based web archiving crawler in a single Docker container

678 (+24)

agpl-3.0

NateScarlet/holiday-cn

📅🇨🇳中国法定节假日数据自动每日抓取国务院公告

1,319 (+23)

mit

scrapfly/scrapfly-scrapers

Scalable Python web scraping scripts for +40 popular domains

342 (+21)

RevoltSecurities/SpideyX

SpideyX a multipurpose Web Penetration Testing tool with asynchronous concurrent performance with multiple mode and configurations.

128 (+19)

mit

lorey/mlscraper

🤖 Scrape data from HTML websites automatically by just providing examples

1,333 (+19)

edoardottt/cariddi

Take a list of domains, crawl urls and scan for endpoints, secrets, api keys, file extensions, tokens and more

1,557 (+18)

gpl-3.0

apache/nutch

Apache Nutch is an extensible and scalable web crawler

2,944 (+16)

apache-2.0

Last month (relative gain)

ArchiveBox/abx-dl

49 (+104%)

mit

janreges/siteone-crawler

352 (+39%)

mit

rebrowser/rebrowser-patches

417 (+21%)

D4Vinci/Scrapling

Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python

1,730 (+18%)

bsd-3-clause

RevoltSecurities/SpideyX

SpideyX a multipurpose Web Penetration Testing tool with asynchronous concurrent performance with multiple mode and configurations.

128 (+17%)

mit

ai-robots-txt/ai.robots.txt

A list of AI agents and robots to block.

1,330 (+7%)

mit

scrapfly/scrapfly-scrapers

Scalable Python web scraping scripts for +40 popular domains

342 (+7%)

apify/crawlee-python

4,877 (+5%)

apache-2.0

ArchiveTeam/wget-lua

Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.

107 (+5%)

gpl-3.0

webrecorder/browsertrix-crawler

Run a high-fidelity browser-based web archiving crawler in a single Docker container

678 (+4%)

agpl-3.0

apify/crawlee

16,247 (+3%)

apache-2.0

mishakorzik/Infect

Create you virus in termux!

192 (+3%)

gpl-3.0

DBeath/feedsearch-crawler

Crawl sites for RSS, Atom, and JSON feeds.

65 (+3%)

mit

StJudeWasHere/seonaut

Open source SEO auditing tool.

270 (+3%)

mit

amerkurev/scrapper

Web scraper with a simple REST API living in Docker and using a Headless browser and Readability.js for parsing.

183 (+3%)

apache-2.0

JaehyoJJAng/Coupang-Review-Crawling

쿠팡 리뷰 크롤링

37 (+3%)

mit

Symbolexe/Raven

Raven is a powerful and customizable web crawler written in Go.

40 (+3%)

mit

pzaino/thecrowler

A Content Discovery and Development Platform. Empowering Cybersecurity, AI, Marketing, and Finance professionals and researchers to discover, analyze, and interact with the web in all its dimensions.

41 (+3%)

apache-2.0

crwlrsoft/crawler

Library for Rapid (Web) Crawler and Scraper Development

344 (+2%)

mit

go-rod/rod

A Chrome DevTools Protocol driver for web automation and scraping.

5,537 (+2%)

mit

Last 12-months (new repositories)

apify/crawlee-python

4,877

apache-2.0

D4Vinci/Scrapling

Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python

1,730

bsd-3-clause

ai-robots-txt/ai.robots.txt

A list of AI agents and robots to block.

1,330

mit

rebrowser/rebrowser-patches

417

RevoltSecurities/SpideyX

SpideyX a multipurpose Web Penetration Testing tool with asynchronous concurrent performance with multiple mode and configurations.

128

mit

alexfazio/devdocs-to-llm

Turn any developer documentation into a GPT

mit

ArchiveBox/abx-dl

mit

Symbolexe/Raven

Raven is a powerful and customizable web crawler written in Go.

mit

sabber-slt/NetExtract

NetExtract: Efficiently extract core content from any webpage and convert it to clean, LLM-optimized Markdown with a simple API.

mit

Last 12-months (absolute gain)

apify/crawlee

16,247 (+5,276)

apache-2.0

apify/crawlee-python

4,877 (+4,876)

apache-2.0

scrapy/scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

53,528 (+3,908)

bsd-3-clause

gocolly/colly

Elegant Scraper and Crawler Framework for Golang

23,469 (+2,101)

apache-2.0

D4Vinci/Scrapling

Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python

1,730 (+1,728)

bsd-3-clause

ai-robots-txt/ai.robots.txt

A list of AI agents and robots to block.

1,330 (+1,276)

mit

go-rod/rod

A Chrome DevTools Protocol driver for web automation and scraping.

5,537 (+1,115)

mit

codelucas/newspaper

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

14,232 (+877)

mit

lorien/awesome-web-scraping

List of libraries, tools and APIs for web scraping and data processing.

6,780 (+695)

hakluke/hakrawler

Simple, fast web crawler designed for easy, quick discovery of endpoints and assets within a web application

4,529 (+474)

gpl-3.0

hardkoded/puppeteer-sharp

Headless Chrome .NET API

3,450 (+470)

mit

rebrowser/rebrowser-patches

417 (+410)

NateScarlet/holiday-cn

📅🇨🇳中国法定节假日数据自动每日抓取国务院公告

1,319 (+398)

mit

edoardottt/cariddi

Take a list of domains, crawl urls and scan for endpoints, secrets, api keys, file extensions, tokens and more

1,557 (+342)

gpl-3.0

janreges/siteone-crawler

352 (+330)

mit

scrapfly/scrapfly-scrapers

Scalable Python web scraping scripts for +40 popular domains

342 (+278)

MontFerret/ferret

Declarative web scraping

5,761 (+230)

apache-2.0

webrecorder/browsertrix-crawler

Run a high-fidelity browser-based web archiving crawler in a single Docker container

678 (+224)

agpl-3.0

apache/nutch

Apache Nutch is an extensible and scalable web crawler

2,944 (+216)

apache-2.0

elixir-crawly/crawly

Crawly, a high-level web crawling & scraping framework for Elixir.

996 (+206)

apache-2.0

Last 12-months (relative gain)

rebrowser/rebrowser-patches

417 (+5,857%)

ai-robots-txt/ai.robots.txt

A list of AI agents and robots to block.

1,330 (+2,363%)

mit

karthikuj/sasori

Sasori is a dynamic web crawler powered by Puppeteer, designed for lightning-fast endpoint discovery.

132 (+1,786%)

mit

janreges/siteone-crawler

352 (+1,500%)

mit

RevoltSecurities/SpideyX

SpideyX a multipurpose Web Penetration Testing tool with asynchronous concurrent performance with multiple mode and configurations.

128 (+1,322%)

mit

ArchiveBox/abx-dl

49 (+880%)

mit

scrapfly/scrapfly-scrapers

Scalable Python web scraping scripts for +40 popular domains

342 (+434%)

StJudeWasHere/seonaut

Open source SEO auditing tool.

270 (+214%)

mit

amerkurev/scrapper

Web scraper with a simple REST API living in Docker and using a Headless browser and Readability.js for parsing.

183 (+158%)

apache-2.0

adbar/courlan

Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters

127 (+127%)

apache-2.0

mike-gee/webtranspose

Web scraping API for building AI applications.

41 (+64%)

mishakorzik/Infect

Create you virus in termux!

192 (+61%)

gpl-3.0

JaehyoJJAng/Coupang-Review-Crawling

쿠팡 리뷰 크롤링

37 (+54%)

mit

webrecorder/browsertrix-crawler

Run a high-fidelity browser-based web archiving crawler in a single Docker container

678 (+49%)

agpl-3.0

apify/crawlee

16,247 (+48%)

apache-2.0

l4rm4nd/XingDumper

Python 3 script to dump/scrape/extract company employees from XING API

37 (+48%)

KoreanThinker/billboard-json

🎧 Get json type billboard hot 100 chart

39 (+44%)

gpl-3.0

18520339/facebook-data-extraction

Experience for effectively fetching Facebook data by Querying Graph API with Account-based Token and Operating undetectable scraping Bots to extract Client/Server-side Rendered content

172 (+43%)

mit

NateScarlet/holiday-cn

📅🇨🇳中国法定节假日数据自动每日抓取国务院公告

1,319 (+43%)

mit

ArchiveTeam/wget-lua

Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.

107 (+43%)

gpl-3.0