Trending repositories for topic crawler
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。
🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。
Crawly, a high-level web crawling & scraping framework for Elixir.
一个基于✨HOOK机制的微信机器人,支持🌱安全新闻定时推送【FreeBuf,先知,安全客,奇安信攻防社区】,👯Kfc文案,⚡备案查询,⚡手机号归属地查询,⚡WHOIS信息查询,🎉星座查询,⚡天气查询,🌱摸鱼日历,⚡微步威胁情报查询, 🐛美女视频,⚡美女图片,👯帮助菜单。📫 支持积分功能,⚡支持自动拉人,⚡检测广告,🌱自动群发,👯Ai回复,😄自定义程度丰富,小白也可轻松上手!
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Scrapy, a fast high-level web crawling & scraping framework for Python.
🚀🚀🚀feapder is an easy to use, powerful crawler framework | feapder是一款上手简单,功能强大的Python爬虫框架。内置AirSpider、Spider、TaskSpider、BatchSpider四种爬虫解决不同场景的需求。且支持断点续爬、监控报警、浏览器渲染、海量数据去重等功能。更有功能强大的爬虫管理系统feaplat为其提...
实战🐍多种网站、电商数据爬虫🕷。包含🕸:淘宝商品、微信公众号、大众点评、企查查、招聘网站、闲鱼、阿里任务、博客园、微博、百度贴吧、豆瓣电影、包图网、全景网、豆瓣音乐、某省药监局、搜狐新闻、机器学习文本采集、fofa资产采集、汽车之家、国家统计局、百度关键词收录数、蜘蛛泛目录、今日头条、豆瓣影评、携程、小米应用商店、安居客、途家民宿❤️❤️❤️。微信爬虫展示项目:
Clean, filter and sample URLs to optimize data collection – includes spam, content type and language filters
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Crawly, a high-level web crawling & scraping framework for Elixir.
SecretScraper is a web scraper that crawl through target websites, scrape from http response and extract secret information via regular expression.
News crawling with StormCrawler - stores content as WARC
一个基于✨HOOK机制的微信机器人,支持🌱安全新闻定时推送【FreeBuf,先知,安全客,奇安信攻防社区】,👯Kfc文案,⚡备案查询,⚡手机号归属地查询,⚡WHOIS信息查询,🎉星座查询,⚡天气查询,🌱摸鱼日历,⚡微步威胁情报查询, 🐛美女视频,⚡美女图片,👯帮助菜单。📫 支持积分功能,⚡支持自动拉人,⚡检测广告,🌱自动群发,👯Ai回复,😄自定义程度丰富,小白也可轻松上手!
自学入门 Python 优质中文资源索引,包含 书籍 / 文档 / 视频,适用于 爬虫 / Web / 数据分析 / 机器学习 方向
A Simple Mihomo/Clash.Meta/Clash GUI.一个简易的 Mihomo/Clash.Meta/Clash 图形操作界面。
Golang短视频去水印:抖音,皮皮虾,火山,微视,最右,快手,全民小视频,皮皮搞笑,西瓜视频,虎牙,梨视频,acfun,好看视频...
爬虫逆向案例,已完成:TLS指纹|瑞数|震坤行 | 网易易盾 | 微信小程序反编译逆向(百达星系) | 同花顺 | rpc解密 | 加速乐 | 极验滑块验证码 | 巨量算数 | Boss直聘 | 企查查 | 中国五矿 | qq音乐 | 产业政策大数据平台 | 企知道 | 雪球网(acw_sc__v2) | 1688 | 七麦数据 | whggzy | 企名科技 | mohurd | 艺恩数据 | ...
Python API for JMComic | 提供Python API访问禁漫天堂,同时支持网页端和移动端 | 禁漫天堂GitHub Actions下载器🚀
小红书数据采集、网站图片、视频资源批量下载工具,颜值超高的数据采集工具(批量下载,视频提取,图片,去水印等)Telegram:https://t.me/+ZtLSwuIKTo44MDY1
Web scraper with a simple REST API living in Docker and using a Headless browser and Readability.js for parsing.
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...
一个基于✨HOOK机制的微信机器人,支持🌱安全新闻定时推送【FreeBuf,先知,安全客,奇安信攻防社区】,👯Kfc文案,⚡备案查询,⚡手机号归属地查询,⚡WHOIS信息查询,🎉星座查询,⚡天气查询,🌱摸鱼日历,⚡微步威胁情报查询, 🐛美女视频,⚡美女图片,👯帮助菜单。📫 支持积分功能,⚡支持自动拉人,⚡检测广告,🌱自动群发,👯Ai回复,😄自定义程度丰富,小白也可轻松上手!
🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。
Scrapy, a fast high-level web crawling & scraping framework for Python.
Crawly, a high-level web crawling & scraping framework for Elixir.
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
🚀🚀🚀feapder is an easy to use, powerful crawler framework | feapder是一款上手简单,功能强大的Python爬虫框架。内置AirSpider、Spider、TaskSpider、BatchSpider四种爬虫解决不同场景的需求。且支持断点续爬、监控报警、浏览器渲染、海量数据去重等功能。更有功能强大的爬虫管理系统feaplat为其提...
爬虫逆向案例,已完成:TLS指纹|瑞数|震坤行 | 网易易盾 | 微信小程序反编译逆向(百达星系) | 同花顺 | rpc解密 | 加速乐 | 极验滑块验证码 | 巨量算数 | Boss直聘 | 企查查 | 中国五矿 | qq音乐 | 产业政策大数据平台 | 企知道 | 雪球网(acw_sc__v2) | 1688 | 七麦数据 | whggzy | 企名科技 | mohurd | 艺恩数据 | ...
实战🐍多种网站、电商数据爬虫🕷。包含🕸:淘宝商品、微信公众号、大众点评、企查查、招聘网站、闲鱼、阿里任务、博客园、微博、百度贴吧、豆瓣电影、包图网、全景网、豆瓣音乐、某省药监局、搜狐新闻、机器学习文本采集、fofa资产采集、汽车之家、国家统计局、百度关键词收录数、蜘蛛泛目录、今日头条、豆瓣影评、携程、小米应用商店、安居客、途家民宿❤️❤️❤️。微信爬虫展示项目:
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Clean, filter and sample URLs to optimize data collection – includes spam, content type and language filters
一个基于✨HOOK机制的微信机器人,支持🌱安全新闻定时推送【FreeBuf,先知,安全客,奇安信攻防社区】,👯Kfc文案,⚡备案查询,⚡手机号归属地查询,⚡WHOIS信息查询,🎉星座查询,⚡天气查询,🌱摸鱼日历,⚡微步威胁情报查询, 🐛美女视频,⚡美女图片,👯帮助菜单。📫 支持积分功能,⚡支持自动拉人,⚡检测广告,🌱自动群发,👯Ai回复,😄自定义程度丰富,小白也可轻松上手!
爬虫逆向案例,已完成:TLS指纹|瑞数|震坤行 | 网易易盾 | 微信小程序反编译逆向(百达星系) | 同花顺 | rpc解密 | 加速乐 | 极验滑块验证码 | 巨量算数 | Boss直聘 | 企查查 | 中国五矿 | qq音乐 | 产业政策大数据平台 | 企知道 | 雪球网(acw_sc__v2) | 1688 | 七麦数据 | whggzy | 企名科技 | mohurd | 艺恩数据 | ...
Crawly, a high-level web crawling & scraping framework for Elixir.
Python API for JMComic | 提供Python API访问禁漫天堂,同时支持网页端和移动端 | 禁漫天堂GitHub Actions下载器🚀
SecretScraper is a web scraper that crawl through target websites, scrape from http response and extract secret information via regular expression.
News crawling with StormCrawler - stores content as WARC
A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。
🔥 PHP library to warm up caches of URLs located in XML sitemaps
A simple, open-source, easy to use, and free download manager for malware samples.
小红书数据采集、网站图片、视频资源批量下载工具,颜值超高的数据采集工具(批量下载,视频提取,图片,去水印等)Telegram:https://t.me/+ZtLSwuIKTo44MDY1
A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
一个基于✨HOOK机制的微信机器人,支持🌱安全新闻定时推送【FreeBuf,先知,安全客,奇安信攻防社区】,👯Kfc文案,⚡备案查询,⚡手机号归属地查询,⚡WHOIS信息查询,🎉星座查询,⚡天气查询,🌱摸鱼日历,⚡微步威胁情报查询, 🐛美女视频,⚡美女图片,👯帮助菜单。📫 支持积分功能,⚡支持自动拉人,⚡检测广告,🌱自动群发,👯Ai回复,😄自定义程度丰富,小白也可轻松上手!
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...
🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。
Scrapy, a fast high-level web crawling & scraping framework for Python.
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
Collection of China illegal cases about web crawler 本项目用来整理所有中国大陆爬虫开发者涉诉与违规相关的新闻、资料与法律法规。致力于帮助在中国大陆工作的爬虫行业从业者了解我国相关法律,避免触碰数据合规红线。 [AD]中文知识图谱门户
🚀🚀🚀feapder is an easy to use, powerful crawler framework | feapder是一款上手简单,功能强大的Python爬虫框架。内置AirSpider、Spider、TaskSpider、BatchSpider四种爬虫解决不同场景的需求。且支持断点续爬、监控报警、浏览器渲染、海量数据去重等功能。更有功能强大的爬虫管理系统feaplat为其提...
Sasori is a dynamic web crawler powered by Puppeteer, designed for lightning-fast endpoint discovery.
一个基于✨HOOK机制的微信机器人,支持🌱安全新闻定时推送【FreeBuf,先知,安全客,奇安信攻防社区】,👯Kfc文案,⚡备案查询,⚡手机号归属地查询,⚡WHOIS信息查询,🎉星座查询,⚡天气查询,🌱摸鱼日历,⚡微步威胁情报查询, 🐛美女视频,⚡美女图片,👯帮助菜单。📫 支持积分功能,⚡支持自动拉人,⚡检测广告,🌱自动群发,👯Ai回复,😄自定义程度丰富,小白也可轻松上手!
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
SecretScraper is a web scraper that crawl through target websites, scrape from http response and extract secret information via regular expression.
A Simple Mihomo/Clash.Meta/Clash GUI.一个简易的 Mihomo/Clash.Meta/Clash 图形操作界面。
A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。
Download HD images from pinterest by your favorite keywords
Web scraper with a simple REST API living in Docker and using a Headless browser and Readability.js for parsing.
Python API for JMComic | 提供Python API访问禁漫天堂,同时支持网页端和移动端 | 禁漫天堂GitHub Actions下载器🚀
爬虫逆向案例,已完成:TLS指纹|瑞数|震坤行 | 网易易盾 | 微信小程序反编译逆向(百达星系) | 同花顺 | rpc解密 | 加速乐 | 极验滑块验证码 | 巨量算数 | Boss直聘 | 企查查 | 中国五矿 | qq音乐 | 产业政策大数据平台 | 企知道 | 雪球网(acw_sc__v2) | 1688 | 七麦数据 | whggzy | 企名科技 | mohurd | 艺恩数据 | ...
Clean, filter and sample URLs to optimize data collection – includes spam, content type and language filters
Python-based web crawling script with randomized intervals, user-agent rotation, and proxy server IP rotation to outsmart website bots and prevent blocking.
Golang短视频去水印:抖音,皮皮虾,火山,微视,最右,快手,全民小视频,皮皮搞笑,西瓜视频,虎牙,梨视频,acfun,好看视频...
🔥 PHP library to warm up caches of URLs located in XML sitemaps
🤖Free ChatGPT Line Bot with Horoscope, Music Broadcast, Google Image Search...
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
爬虫逆向案例,已完成:TLS指纹|瑞数|震坤行 | 网易易盾 | 微信小程序反编译逆向(百达星系) | 同花顺 | rpc解密 | 加速乐 | 极验滑块验证码 | 巨量算数 | Boss直聘 | 企查查 | 中国五矿 | qq音乐 | 产业政策大数据平台 | 企知道 | 雪球网(acw_sc__v2) | 1688 | 七麦数据 | whggzy | 企名科技 | mohurd | 艺恩数据 | ...
📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.
Prying Deep - An OSINT tool to collect intelligence on the dark web.
An advanced [Finder | Checker | Server] tool for proxy servers, supporting both HTTP(S) and SOCKS protocols. 🎭
Sasori is a dynamic web crawler powered by Puppeteer, designed for lightning-fast endpoint discovery.
🤖 A curated list of websites that restrict access to AI Agents, AI crawlers and GPTs
🤖Free ChatGPT Line Bot with Horoscope, Music Broadcast, Google Image Search...
SiteOne Crawler is a website analyzer and exporter you'll ♥ as a Dev/DevOps, QA engineer, website owner or consultant. Works on all popular platforms - Windows, macOS and Linux (x64 and arm64 too).
ScrapeGPT is a RAG-based Telegram bot designed to scrape and analyze websites, then answer questions based on the scraped content. The bot utilizes Retrieval Augmented Generation and webscraping to re...
A Python crawler tool that can automatically simulate browser operations to crawl all users' tweet content and save all static resources (videos, pictures) locally without calling the Twitter API. Use...
🧩 / 🕸 WebsiteCrawler - This plugin automatically crawls the main content of a specified URL webpage and uses it as context input.
A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
👾 Fast and simple video download library and CLI tool written in Go
🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...
Scrapy, a fast high-level web crawling & scraping framework for Python.
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
一个基于✨HOOK机制的微信机器人,支持🌱安全新闻定时推送【FreeBuf,先知,安全客,奇安信攻防社区】,👯Kfc文案,⚡备案查询,⚡手机号归属地查询,⚡WHOIS信息查询,🎉星座查询,⚡天气查询,🌱摸鱼日历,⚡微步威胁情报查询, 🐛美女视频,⚡美女图片,👯帮助菜单。📫 支持积分功能,⚡支持自动拉人,⚡检测广告,🌱自动群发,👯Ai回复,😄自定义程度丰富,小白也可轻松上手!
一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、微信读书、豆瓣、QQ等网站。(Some interesting examples of python crawlers that are friendly to beginners. )
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
🚀🚀🚀feapder is an easy to use, powerful crawler framework | feapder是一款上手简单,功能强大的Python爬虫框架。内置AirSpider、Spider、TaskSpider、BatchSpider四种爬虫解决不同场景的需求。且支持断点续爬、监控报警、浏览器渲染、海量数据去重等功能。更有功能强大的爬虫管理系统feaplat为其提...
一个好用的哔哩哔哩漫画下载器,拥有图形界面,支持关键词搜索漫画和二维码登入,黑科技下载未解锁章节,多线程下载,多种保存格式,本地漫画管理,一键检查更新!
A command-line interface (CLI) based utility to recursively crawl webpages. It is designed to systematically browse webpages' URLs and follow links to discover linked webpages' URLs.
Python API for JMComic | 提供Python API访问禁漫天堂,同时支持网页端和移动端 | 禁漫天堂GitHub Actions下载器🚀
js cookie逆向利器:js cookie变动监控可视化工具 & js cookie hook打条件断点
一个基于✨HOOK机制的微信机器人,支持🌱安全新闻定时推送【FreeBuf,先知,安全客,奇安信攻防社区】,👯Kfc文案,⚡备案查询,⚡手机号归属地查询,⚡WHOIS信息查询,🎉星座查询,⚡天气查询,🌱摸鱼日历,⚡微步威胁情报查询, 🐛美女视频,⚡美女图片,👯帮助菜单。📫 支持积分功能,⚡支持自动拉人,⚡检测广告,🌱自动群发,👯Ai回复,😄自定义程度丰富,小白也可轻松上手!
Sasori is a dynamic web crawler powered by Puppeteer, designed for lightning-fast endpoint discovery.
🤖Free ChatGPT Line Bot with Horoscope, Music Broadcast, Google Image Search...
In this Python Web Scraping Tutorial, we will outline everything needed to get started with web scraping. We will begin with simple examples and move on to relatively more complex.
Web scraper with a simple REST API living in Docker and using a Headless browser and Readability.js for parsing.
Jie stands out as a comprehensive security assessment and exploitation tool meticulously crafted for web applications. Its robust suite of features encompasses vulnerability scanning, information gath...
SecretScraper is a web scraper that crawl through target websites, scrape from http response and extract secret information via regular expression.
Python-based web crawling script with randomized intervals, user-agent rotation, and proxy server IP rotation to outsmart website bots and prevent blocking.
Nodejs library that provides an Api for obtaining the movies information from FlixHQ website.
ScrapeGPT is a RAG-based Telegram bot designed to scrape and analyze websites, then answer questions based on the scraped content. The bot utilizes Retrieval Augmented Generation and webscraping to re...