Trending repositories for topic web-crawler

Last 3 days (new repositories)

no newly created repositories trending in the last 3 days

Last 3 days (absolute gain)

mendableai/firecrawl

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

35,259 (+578)

agpl-3.0

mendableai/firecrawl-mcp-server

Official Firecrawl MCP Server - Adds powerful web scraping to Cursor, Claude and any other LLM clients.

2,405 (+89)

mit

mendableai/firecrawl-app-examples

🔥 This repository contains complete application examples, including websites and other projects, developed using Firecrawl.

294 (+30)

graphlit/graphlit-mcp-server

Model Context Protocol (MCP) Server for Graphlit Platform

183 (+24)

mit

apify/crawlee

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...

17,408 (+22)

apache-2.0

apify/crawlee-python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...

5,507 (+14)

apache-2.0

adithya-s-k/omniparse

Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks

6,472 (+9)

gpl-3.0

crawlab-team/crawlab

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台，支持任何语言和框架

11,702 (+7)

bsd-3-clause

gildas-lormeau/single-file-cli

CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)

765 (+6)

agpl-3.0

webrecorder/browsertrix-crawler

Run a high-fidelity browser-based web archiving crawler in a single Docker container

748 (+4)

agpl-3.0

scrapfly/scrapfly-scrapers

Scalable Python web scraping scripts for +40 popular domains

471 (+3)

jasonxtn/Argus

The Ultimate Information Gathering Toolkit

1,932 (+3)

FunnySaltyFish/bilibili_comments_crawl

基于 B 站评论区数据构建大语言模型训练用对话数据集

47 (+1)

spider-rs/spider-py

Spider ported to Python

76 (+1)

mit

jpjacobpadilla/Stealth-Requests

Undetected Web-Scraping & Seamless HTML Parsing in Python!

234 (+1)

mit

lewisdonovan/google-news-scraper

Lightweight scraper for Google News

310 (+1)

lefterisloukas/edgar-crawler

The only open-source toolkit that can download SEC EDGAR financial reports and extract textual data from specific item sections into nice & clean structured JSON files.

365 (+1)

gpl-3.0

cxcscmu/Craw4LLM

Official repository for "Craw4LLM: Efficient Web Crawling for LLM Pretraining"

608 (+1)

mit

Algebra-FUN/WeReadScan

扫描“微信读书”已购图书并下载本地PDF的爬虫

939 (+1)

BruceDone/awesome-crawler

A collection of awesome web crawler,spider in different languages

6,710 (+0)

mit

Last 3 days (relative gain)

graphlit/graphlit-mcp-server

Model Context Protocol (MCP) Server for Graphlit Platform

183 (+15%)

mit

mendableai/firecrawl-app-examples

🔥 This repository contains complete application examples, including websites and other projects, developed using Firecrawl.

294 (+11%)

mendableai/firecrawl-mcp-server

Official Firecrawl MCP Server - Adds powerful web scraping to Cursor, Claude and any other LLM clients.

2,405 (+4%)

mit

FunnySaltyFish/bilibili_comments_crawl

基于 B 站评论区数据构建大语言模型训练用对话数据集

47 (+2%)

mendableai/firecrawl

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

35,259 (+2%)

agpl-3.0

spider-rs/spider-py

Spider ported to Python

76 (+1%)

mit

gildas-lormeau/single-file-cli

CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)

765 (+0.8%)

agpl-3.0

scrapfly/scrapfly-scrapers

Scalable Python web scraping scripts for +40 popular domains

471 (+0.6%)

webrecorder/browsertrix-crawler

Run a high-fidelity browser-based web archiving crawler in a single Docker container

748 (+0.5%)

agpl-3.0

jpjacobpadilla/Stealth-Requests

Undetected Web-Scraping & Seamless HTML Parsing in Python!

234 (+0.4%)

mit

lewisdonovan/google-news-scraper

Lightweight scraper for Google News

310 (+0.3%)

lefterisloukas/edgar-crawler

The only open-source toolkit that can download SEC EDGAR financial reports and extract textual data from specific item sections into nice & clean structured JSON files.

365 (+0.3%)

gpl-3.0

apify/crawlee-python

5,507 (+0.3%)

apache-2.0

cxcscmu/Craw4LLM

Official repository for "Craw4LLM: Efficient Web Crawling for LLM Pretraining"

608 (+0.2%)

mit

jasonxtn/Argus

The Ultimate Information Gathering Toolkit

1,932 (+0.2%)

adithya-s-k/omniparse

Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks

6,472 (+0.1%)

gpl-3.0

apify/crawlee

17,408 (+0.1%)

apache-2.0

Algebra-FUN/WeReadScan

扫描“微信读书”已购图书并下载本地PDF的爬虫

939 (+0.1%)

crawlab-team/crawlab

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台，支持任何语言和框架

11,702 (+0.1%)

bsd-3-clause

BruceDone/awesome-crawler

A collection of awesome web crawler,spider in different languages

6,710 (+0.0%)

mit

Last week (new repositories)

no newly created repositories trending in the last week

Last week (absolute gain)

mendableai/firecrawl

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

35,259 (+1,099)

agpl-3.0

mendableai/firecrawl-mcp-server

Official Firecrawl MCP Server - Adds powerful web scraping to Cursor, Claude and any other LLM clients.

2,405 (+250)

mit

mendableai/firecrawl-app-examples

🔥 This repository contains complete application examples, including websites and other projects, developed using Firecrawl.

294 (+78)

apify/crawlee

17,408 (+58)

apache-2.0

graphlit/graphlit-mcp-server

Model Context Protocol (MCP) Server for Graphlit Platform

183 (+40)

mit

apify/crawlee-python

5,507 (+24)

apache-2.0

adithya-s-k/omniparse

Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks

6,472 (+18)

gpl-3.0

crawlab-team/crawlab

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台，支持任何语言和框架

11,702 (+15)

bsd-3-clause

webrecorder/browsertrix-crawler

Run a high-fidelity browser-based web archiving crawler in a single Docker container

748 (+9)

agpl-3.0

jasonxtn/Argus

The Ultimate Information Gathering Toolkit

1,932 (+7)

ssssssss-team/spider-flow

新一代爬虫平台，以图形化方式定义爬虫流程，不写代码即可完成爬虫。

9,905 (+7)

mit

lewisdonovan/google-news-scraper

Lightweight scraper for Google News

310 (+6)

gildas-lormeau/single-file-cli

CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)

765 (+6)

agpl-3.0

jpjacobpadilla/Stealth-Requests

Undetected Web-Scraping & Seamless HTML Parsing in Python!

234 (+5)

mit

scrapfly/scrapfly-scrapers

Scalable Python web scraping scripts for +40 popular domains

471 (+5)

internetarchive/Zeno

State-of-the-art web crawler 🔱

145 (+4)

agpl-3.0

lefterisloukas/edgar-crawler

The only open-source toolkit that can download SEC EDGAR financial reports and extract textual data from specific item sections into nice & clean structured JSON files.

365 (+4)

gpl-3.0

BruceDone/awesome-crawler

A collection of awesome web crawler,spider in different languages

6,710 (+3)

mit

cxcscmu/Craw4LLM

Official repository for "Craw4LLM: Efficient Web Crawling for LLM Pretraining"

608 (+3)

mit

FunnySaltyFish/bilibili_comments_crawl

基于 B 站评论区数据构建大语言模型训练用对话数据集

47 (+2)

Last week (relative gain)

mendableai/firecrawl-app-examples

🔥 This repository contains complete application examples, including websites and other projects, developed using Firecrawl.

294 (+36%)

graphlit/graphlit-mcp-server

Model Context Protocol (MCP) Server for Graphlit Platform

183 (+28%)

mit

mendableai/firecrawl-mcp-server

Official Firecrawl MCP Server - Adds powerful web scraping to Cursor, Claude and any other LLM clients.

2,405 (+12%)

mit

FunnySaltyFish/bilibili_comments_crawl

基于 B 站评论区数据构建大语言模型训练用对话数据集

47 (+4%)

mendableai/firecrawl

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

35,259 (+3%)

agpl-3.0

internetarchive/Zeno

State-of-the-art web crawler 🔱

145 (+3%)

agpl-3.0

jpjacobpadilla/Stealth-Requests

Undetected Web-Scraping & Seamless HTML Parsing in Python!

234 (+2%)

mit

lewisdonovan/google-news-scraper

Lightweight scraper for Google News

310 (+2%)

spider-rs/spider-py

Spider ported to Python

76 (+1%)

mit

webrecorder/browsertrix-crawler

Run a high-fidelity browser-based web archiving crawler in a single Docker container

748 (+1%)

agpl-3.0

lefterisloukas/edgar-crawler

The only open-source toolkit that can download SEC EDGAR financial reports and extract textual data from specific item sections into nice & clean structured JSON files.

365 (+1%)

gpl-3.0

scrapfly/scrapfly-scrapers

Scalable Python web scraping scripts for +40 popular domains

471 (+1%)

gildas-lormeau/single-file-cli

CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)

765 (+0.8%)

agpl-3.0

cxcscmu/Craw4LLM

Official repository for "Craw4LLM: Efficient Web Crawling for LLM Pretraining"

608 (+0.5%)

mit

apify/crawlee-python

5,507 (+0.4%)

apache-2.0

jasonxtn/Argus

The Ultimate Information Gathering Toolkit

1,932 (+0.4%)

s0rg/crawley

The unix-way web crawler

292 (+0.3%)

mit

apify/crawlee

17,408 (+0.3%)

apache-2.0

commoncrawl/news-crawl

News crawling with StormCrawler - stores content as WARC

341 (+0.3%)

apache-2.0

TurnerSoftware/InfinityCrawler

A simple but powerful web crawler library for .NET

251 (+0.0%)

mit

Last month (new repositories)

no newly created repositories trending in the last month

Last month (absolute gain)

mendableai/firecrawl

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

35,259 (+4,328)

agpl-3.0

mendableai/firecrawl-mcp-server

Official Firecrawl MCP Server - Adds powerful web scraping to Cursor, Claude and any other LLM clients.

2,405 (+1,548)

mit

apify/crawlee

17,408 (+286)

apache-2.0

mendableai/firecrawl-app-examples

🔥 This repository contains complete application examples, including websites and other projects, developed using Firecrawl.

294 (+153)

graphlit/graphlit-mcp-server

Model Context Protocol (MCP) Server for Graphlit Platform

183 (+133)

mit

apify/crawlee-python

5,507 (+108)

apache-2.0

adithya-s-k/omniparse

Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks

6,472 (+104)

gpl-3.0

ssssssss-team/spider-flow

新一代爬虫平台，以图形化方式定义爬虫流程，不写代码即可完成爬虫。

9,905 (+60)

mit

crawlab-team/crawlab

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台，支持任何语言和框架

11,702 (+52)

bsd-3-clause

BruceDone/awesome-crawler

A collection of awesome web crawler,spider in different languages

6,710 (+43)

mit

jasonxtn/Argus

The Ultimate Information Gathering Toolkit

1,932 (+42)

scrapfly/scrapfly-scrapers

Scalable Python web scraping scripts for +40 popular domains

471 (+37)

MarginaliaSearch/MarginaliaSearch

Internet search engine for text-oriented websites. Indexing the small, old and weird web.

1,307 (+33)

webrecorder/browsertrix-crawler

Run a high-fidelity browser-based web archiving crawler in a single Docker container

748 (+30)

agpl-3.0

cxcscmu/Craw4LLM

Official repository for "Craw4LLM: Efficient Web Crawling for LLM Pretraining"

608 (+28)

mit

gildas-lormeau/single-file-cli

CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)

765 (+28)

agpl-3.0

internetarchive/Zeno

State-of-the-art web crawler 🔱

145 (+20)

agpl-3.0

lefterisloukas/edgar-crawler

The only open-source toolkit that can download SEC EDGAR financial reports and extract textual data from specific item sections into nice & clean structured JSON files.

365 (+16)

gpl-3.0

apache/nutch

Apache Nutch is an extensible and scalable web crawler

3,000 (+13)

apache-2.0

Madi-S/Lead-Generation

Python script, which empowers people with no programming background to generate robust leads on a mass scale. This repo will be compiled of various versatile techniques used in lead generation.

155 (+11)

mit

Last month (relative gain)

graphlit/graphlit-mcp-server

Model Context Protocol (MCP) Server for Graphlit Platform

183 (+266%)

mit

mendableai/firecrawl-mcp-server

Official Firecrawl MCP Server - Adds powerful web scraping to Cursor, Claude and any other LLM clients.

2,405 (+181%)

mit

mendableai/firecrawl-app-examples

🔥 This repository contains complete application examples, including websites and other projects, developed using Firecrawl.

294 (+109%)

FunnySaltyFish/bilibili_comments_crawl

基于 B 站评论区数据构建大语言模型训练用对话数据集

47 (+18%)

internetarchive/Zeno

State-of-the-art web crawler 🔱

145 (+16%)

agpl-3.0

mendableai/firecrawl

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

35,259 (+14%)

agpl-3.0

spider-rs/spider-py

Spider ported to Python

76 (+10%)

mit

scrapfly/scrapfly-scrapers

Scalable Python web scraping scripts for +40 popular domains

471 (+9%)

Madi-S/Lead-Generation

Python script, which empowers people with no programming background to generate robust leads on a mass scale. This repo will be compiled of various versatile techniques used in lead generation.

155 (+8%)

mit

dream-num/univer-clipsheet

A powerful Chrome extension for web scraping

111 (+6%)

apache-2.0

cxcscmu/Craw4LLM

Official repository for "Craw4LLM: Efficient Web Crawling for LLM Pretraining"

608 (+5%)

mit

lefterisloukas/edgar-crawler

The only open-source toolkit that can download SEC EDGAR financial reports and extract textual data from specific item sections into nice & clean structured JSON files.

365 (+5%)

gpl-3.0

jgravelle/groqcrawl

GroqCrawl is a powerful and user-friendly web crawling and scraping application built with Streamlit and powered by PocketGroq. It provides an intuitive interface for extracting LLM friendly AI consum...

74 (+4%)

webrecorder/browsertrix-crawler

Run a high-fidelity browser-based web archiving crawler in a single Docker container

748 (+4%)

agpl-3.0

amalrajan/learncpp-download

Multi-threaded web scraper to download all the tutorials from www.learncpp.com and convert them to PDF files concurrently.

81 (+4%)

agpl-3.0

gildas-lormeau/single-file-cli

CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)

765 (+4%)

agpl-3.0

lewisdonovan/google-news-scraper

Lightweight scraper for Google News

310 (+3%)

jpjacobpadilla/Stealth-Requests

Undetected Web-Scraping & Seamless HTML Parsing in Python!

234 (+3%)

mit

MarginaliaSearch/MarginaliaSearch

Internet search engine for text-oriented websites. Indexing the small, old and weird web.

1,307 (+3%)

jasonxtn/Argus

The Ultimate Information Gathering Toolkit

1,932 (+2%)

Last 12-months (new repositories)

mendableai/firecrawl

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

35,259

agpl-3.0

adithya-s-k/omniparse

Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks

6,472

gpl-3.0

mendableai/firecrawl-mcp-server

Official Firecrawl MCP Server - Adds powerful web scraping to Cursor, Claude and any other LLM clients.

2,405

mit

jasonxtn/Argus

The Ultimate Information Gathering Toolkit

1,932

cxcscmu/Craw4LLM

Official repository for "Craw4LLM: Efficient Web Crawling for LLM Pretraining"

608

mit

mendableai/firecrawl-app-examples

🔥 This repository contains complete application examples, including websites and other projects, developed using Firecrawl.

294

jpjacobpadilla/Stealth-Requests

Undetected Web-Scraping & Seamless HTML Parsing in Python!

234

mit

graphlit/graphlit-mcp-server

Model Context Protocol (MCP) Server for Graphlit Platform

183

mit

dream-num/univer-clipsheet

A powerful Chrome extension for web scraping

111

apache-2.0

jgravelle/groqcrawl

Last 12-months (absolute gain)

mendableai/firecrawl

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

35,259 (+35,258)

agpl-3.0

adithya-s-k/omniparse

Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks

6,472 (+6,471)

gpl-3.0

apify/crawlee-python

5,507 (+5,503)

apache-2.0

apify/crawlee

17,408 (+5,446)

apache-2.0

mendableai/firecrawl-mcp-server

Official Firecrawl MCP Server - Adds powerful web scraping to Cursor, Claude and any other LLM clients.

2,405 (+2,404)

mit

jasonxtn/Argus

The Ultimate Information Gathering Toolkit

1,932 (+1,930)

crawlab-team/crawlab

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台，支持任何语言和框架

11,702 (+937)

bsd-3-clause

ssssssss-team/spider-flow

新一代爬虫平台，以图形化方式定义爬虫流程，不写代码即可完成爬虫。

9,905 (+889)

mit

BruceDone/awesome-crawler

A collection of awesome web crawler,spider in different languages

6,710 (+662)

mit

MarginaliaSearch/MarginaliaSearch

Internet search engine for text-oriented websites. Indexing the small, old and weird web.

1,307 (+495)

cxcscmu/Craw4LLM

Official repository for "Craw4LLM: Efficient Web Crawling for LLM Pretraining"

608 (+492)

mit

Algebra-FUN/WeReadScan

扫描“微信读书”已购图书并下载本地PDF的爬虫

939 (+387)

scrapfly/scrapfly-scrapers

Scalable Python web scraping scripts for +40 popular domains

471 (+347)

gildas-lormeau/single-file-cli

CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)

765 (+300)

agpl-3.0

mendableai/firecrawl-app-examples

🔥 This repository contains complete application examples, including websites and other projects, developed using Firecrawl.

294 (+291)

jpjacobpadilla/Stealth-Requests

Undetected Web-Scraping & Seamless HTML Parsing in Python!

234 (+221)

mit

webrecorder/browsertrix-crawler

Run a high-fidelity browser-based web archiving crawler in a single Docker container

748 (+219)

agpl-3.0

apache/nutch

Apache Nutch is an extensible and scalable web crawler

3,000 (+194)

apache-2.0

graphlit/graphlit-mcp-server

Model Context Protocol (MCP) Server for Graphlit Platform

183 (+164)

mit

lefterisloukas/edgar-crawler

The only open-source toolkit that can download SEC EDGAR financial reports and extract textual data from specific item sections into nice & clean structured JSON files.

365 (+141)

gpl-3.0

Last 12-months (relative gain)

apify/crawlee-python

5,507 (+137,575%)

apache-2.0

jpjacobpadilla/Stealth-Requests

Undetected Web-Scraping & Seamless HTML Parsing in Python!

234 (+1,700%)

mit

jgravelle/groqcrawl

74 (+1,133%)

qfcy/Python

This repository contains the python source code, containing more than 40 python projects, involving many fields.仓库用于储存python源代码, 包含40多个python项目，涉及爬虫、算法、OpenGL、tkinter、面向对象编程等多个领域。

61 (+1,120%)

gpl-3.0

graphlit/graphlit-mcp-server

Model Context Protocol (MCP) Server for Graphlit Platform

183 (+863%)

mit

FunnySaltyFish/bilibili_comments_crawl

基于 B 站评论区数据构建大语言模型训练用对话数据集

47 (+683%)

spider-rs/spider-py

Spider ported to Python

76 (+533%)

mit

cxcscmu/Craw4LLM

Official repository for "Craw4LLM: Efficient Web Crawling for LLM Pretraining"

608 (+424%)

mit

internetarchive/Zeno

State-of-the-art web crawler 🔱

145 (+418%)

agpl-3.0

scrapfly/scrapfly-scrapers

Scalable Python web scraping scripts for +40 popular domains

471 (+280%)

Madi-S/Lead-Generation

Python script, which empowers people with no programming background to generate robust leads on a mass scale. This repo will be compiled of various versatile techniques used in lead generation.

155 (+128%)

mit

tech-engine/goscrapy

GoScrapy: Harnessing Go's power for blazingly fast web scraping, inspired by Python's Scrapy framework.

95 (+121%)

gosom/scrapemate

Golang Crawling and scraping framework

109 (+98%)

mit

anlp-team/LTI_Neural_Navigator

"Enhancing LLM Factual Accuracy with RAG to Counter Hallucinations: A Case Study on Domain-Specific Queries in Private Knowledge-Bases" by Jiarui Li and Ye Yuan and Zehua Zhang

43 (+87%)

mit

lewisdonovan/google-news-scraper

Lightweight scraper for Google News

310 (+76%)

Algebra-FUN/WeReadScan

扫描“微信读书”已购图书并下载本地PDF的爬虫

939 (+70%)

amalrajan/learncpp-download

Multi-threaded web scraper to download all the tutorials from www.learncpp.com and convert them to PDF files concurrently.

81 (+65%)

agpl-3.0

gildas-lormeau/single-file-cli

CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)

765 (+65%)

agpl-3.0

lefterisloukas/edgar-crawler

The only open-source toolkit that can download SEC EDGAR financial reports and extract textual data from specific item sections into nice & clean structured JSON files.

365 (+63%)

gpl-3.0

MarginaliaSearch/MarginaliaSearch

Internet search engine for text-oriented websites. Indexing the small, old and weird web.

1,307 (+61%)