Trending repositories for topic crawler
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。
🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。
一个基于✨HOOK机制的微信机器人,支持🌱安全新闻定时推送【FreeBuf,先知,安全客,奇安信攻防社区】,👯Kfc文案,⚡备案查询,⚡手机号归属地查询,⚡WHOIS信息查询,🎉星座查询,⚡天气查询,🌱摸鱼日历,⚡微步威胁情报查询, 🐛美女视频,⚡美女图片,👯帮助菜单。📫 支持积分功能,⚡支持自动拉人,⚡检测广告,🌱自动群发,👯Ai回复,😄自定义程度丰富,小白也可轻松上手!
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...
Scrapy, a fast high-level web crawling & scraping framework for Python.
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
🏳️🌈 Media downloader from any sites, including Twitter, Reddit, Instagram, Threads, Facebook, OnlyFans, YouTube, Pinterest, PornHub, XHamster, XVIDEOS, ThisVid etc.
Download comics novels 小说漫画下载工具 小説漫画のダウンローダ 小說漫畫下載:腾讯漫画 大角虫漫画 有妖气 咪咕 SF漫画 哦漫画 看漫画 漫画柜 汗汗酷漫 動漫伊甸園 快看漫画 微博动漫 733动漫网 大古漫画网 漫画DB 無限動漫 動漫狂 卡推漫画 动漫之家 动漫屋 古风漫画网 36漫画网 亲亲漫画网 乙女漫画 webtoons 咚漫 ニコニコ静画 ComicWalke...
CLI tool for fetching URLs from Wayback Machine, Common Crawl, and VirusTotal.
CLI tool for fetching URLs from Wayback Machine, Common Crawl, and VirusTotal.
Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
⚡ 一款用于自动语音识别 (ASR)、翻译的高性能异步 API。不需要购买Whisper API,使用本地运行的Whisper模型进行推理,并支持多GPU并发,针对分布式部署进行设计。还内置了包括TikTok、抖音等社交媒体平台的爬虫,可实现来自多个社交平台的无缝媒体处理,为媒体内容数据自动化处理提供了强大且可扩展的解决方案。
一个基于✨HOOK机制的微信机器人,支持🌱安全新闻定时推送【FreeBuf,先知,安全客,奇安信攻防社区】,👯Kfc文案,⚡备案查询,⚡手机号归属地查询,⚡WHOIS信息查询,🎉星座查询,⚡天气查询,🌱摸鱼日历,⚡微步威胁情报查询, 🐛美女视频,⚡美女图片,👯帮助菜单。📫 支持积分功能,⚡支持自动拉人,⚡检测广告,🌱自动群发,👯Ai回复,😄自定义程度丰富,小白也可轻松上手!
SpideyX a multipurpose Web Penetration Testing tool with asynchronous concurrent performance with multiple mode and configurations.
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
爬虫逆向案例,已完成:TLS指纹|瑞数|震坤行 | 网易易盾 | 微信小程序反编译逆向(百达星系) | 同花顺 | rpc解密 | 加速乐 | 极验滑块验证码 | 巨量算数 | Boss直聘 | 企查查 | 中国五矿 | qq音乐 | 产业政策大数据平台 | 企知道 | 雪球网(acw_sc__v2) | 1688 | 七麦数据 | whggzy | 企名科技 | mohurd | 艺恩数据 | ...
A Telegram crawler made in Python to automatically search groups and channels and collect any type of data from them (+ dataset included).
A multi-arch image provides one HTTP proxy endpoint with many concurrent tunnels to the Tor network.
🏳️🌈 Media downloader from any sites, including Twitter, Reddit, Instagram, Threads, Facebook, OnlyFans, YouTube, Pinterest, PornHub, XHamster, XVIDEOS, ThisVid etc.
Crawl and extract (regular or onion) webpages through TOR network
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on dema...
Web scraper with a simple REST API living in Docker and using a Headless browser and Readability.js for parsing.
Golang短视频去水印:抖音,皮皮虾,火山,微视,最右,快手,全民小视频,皮皮搞笑,西瓜视频,虎牙,梨视频,acfun,好看视频...
Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...
一个基于✨HOOK机制的微信机器人,支持🌱安全新闻定时推送【FreeBuf,先知,安全客,奇安信攻防社区】,👯Kfc文案,⚡备案查询,⚡手机号归属地查询,⚡WHOIS信息查询,🎉星座查询,⚡天气查询,🌱摸鱼日历,⚡微步威胁情报查询, 🐛美女视频,⚡美女图片,👯帮助菜单。📫 支持积分功能,⚡支持自动拉人,⚡检测广告,🌱自动群发,👯Ai回复,😄自定义程度丰富,小白也可轻松上手!
🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。
Scrapy, a fast high-level web crawling & scraping framework for Python.
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...
🏳️🌈 Media downloader from any sites, including Twitter, Reddit, Instagram, Threads, Facebook, OnlyFans, YouTube, Pinterest, PornHub, XHamster, XVIDEOS, ThisVid etc.
CLI tool for fetching URLs from Wayback Machine, Common Crawl, and VirusTotal.
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on dema...
Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
ScraperAI is an open-source, AI-powered tool designed to simplify web scraping for users of all skill levels.
⚡ 一款用于自动语音识别 (ASR)、翻译的高性能异步 API。不需要购买Whisper API,使用本地运行的Whisper模型进行推理,并支持多GPU并发,针对分布式部署进行设计。还内置了包括TikTok、抖音等社交媒体平台的爬虫,可实现来自多个社交平台的无缝媒体处理,为媒体内容数据自动化处理提供了强大且可扩展的解决方案。
Midnight Sea: navigating in the waters of dark web markets
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on dema...
一个基于✨HOOK机制的微信机器人,支持🌱安全新闻定时推送【FreeBuf,先知,安全客,奇安信攻防社区】,👯Kfc文案,⚡备案查询,⚡手机号归属地查询,⚡WHOIS信息查询,🎉星座查询,⚡天气查询,🌱摸鱼日历,⚡微步威胁情报查询, 🐛美女视频,⚡美女图片,👯帮助菜单。📫 支持积分功能,⚡支持自动拉人,⚡检测广告,🌱自动群发,👯Ai回复,😄自定义程度丰富,小白也可轻松上手!
保存百度贴吧帖子到本地,并且支持图片, 视频, 语音等内容。与本项目配套的阅读器 TiebaReader(https://github.com/Sorceresssis/TiebaReader)
SpideyX a multipurpose Web Penetration Testing tool with asynchronous concurrent performance with multiple mode and configurations.
A Content Discovery and Development Platform. Empowering Cybersecurity, AI, Marketing, and Finance professionals and researchers to discover, analyze, and interact with the web in all its dimensions.
Search for documents in a domain through Search Engines (Google, Bing and Baidu). The objective is to extract metadata
A multi-arch image provides one HTTP proxy endpoint with many concurrent tunnels to the Tor network.
🏳️🌈 Media downloader from any sites, including Twitter, Reddit, Instagram, Threads, Facebook, OnlyFans, YouTube, Pinterest, PornHub, XHamster, XVIDEOS, ThisVid etc.
High-performance asynchronous Douyin(抖音) TikTok Xiaohongshu(小红书) Kuaishou(快手) Weibo(微博) Instagram YouTube(油管) Twitter(X) Captcha Solver(验证码解决器) Temp Mail(临时邮箱) API(接口).
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
使用python+copymanga API来下载copymanga(拷贝漫画)中的漫画,支持批量+选话下载和获取您收藏的漫画并下载!(全平台支持)
⚡ 一款用于自动语音识别 (ASR)、翻译的高性能异步 API。不需要购买Whisper API,使用本地运行的Whisper模型进行推理,并支持多GPU并发,针对分布式部署进行设计。还内置了包括TikTok、抖音等社交媒体平台的爬虫,可实现来自多个社交平台的无缝媒体处理,为媒体内容数据自动化处理提供了强大且可扩展的解决方案。
CLI tool for fetching URLs from Wayback Machine, Common Crawl, and VirusTotal.
Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...
Scrapy, a fast high-level web crawling & scraping framework for Python.
🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
一个基于✨HOOK机制的微信机器人,支持🌱安全新闻定时推送【FreeBuf,先知,安全客,奇安信攻防社区】,👯Kfc文案,⚡备案查询,⚡手机号归属地查询,⚡WHOIS信息查询,🎉星座查询,⚡天气查询,🌱摸鱼日历,⚡微步威胁情报查询, 🐛美女视频,⚡美女图片,👯帮助菜单。📫 支持积分功能,⚡支持自动拉人,⚡检测广告,🌱自动群发,👯Ai回复,😄自定义程度丰富,小白也可轻松上手!
⚡ 一款用于自动语音识别 (ASR)、翻译的高性能异步 API。不需要购买Whisper API,使用本地运行的Whisper模型进行推理,并支持多GPU并发,针对分布式部署进行设计。还内置了包括TikTok、抖音等社交媒体平台的爬虫,可实现来自多个社交平台的无缝媒体处理,为媒体内容数据自动化处理提供了强大且可扩展的解决方案。
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on dema...
Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on dema...
ScraperAI is an open-source, AI-powered tool designed to simplify web scraping for users of all skill levels.
Updated lists of IP addresses/whitelists of good bots and crawlers. Includes GoogleBot, BingBot, DuckDuckBot, etc.
SpideyX a multipurpose Web Penetration Testing tool with asynchronous concurrent performance with multiple mode and configurations.
Free Facebook pages MetaData Scraping Library - Unlimited Calls
Midnight Sea: navigating in the waters of dark web markets
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...
SiteOne Crawler GUI is a cross-platform website crawler and analyzer for SEO, security, accessibility, and performance optimization—ideal for developers, DevOps, QA engineers, and consultants. Support...
一个基于✨HOOK机制的微信机器人,支持🌱安全新闻定时推送【FreeBuf,先知,安全客,奇安信攻防社区】,👯Kfc文案,⚡备案查询,⚡手机号归属地查询,⚡WHOIS信息查询,🎉星座查询,⚡天气查询,🌱摸鱼日历,⚡微步威胁情报查询, 🐛美女视频,⚡美女图片,👯帮助菜单。📫 支持积分功能,⚡支持自动拉人,⚡检测广告,🌱自动群发,👯Ai回复,😄自定义程度丰富,小白也可轻松上手!
Golang短视频去水印:抖音,皮皮虾,火山,微视,最右,快手,全民小视频,皮皮搞笑,西瓜视频,虎牙,梨视频,acfun,好看视频...
保存百度贴吧帖子到本地,并且支持图片, 视频, 语音等内容。与本项目配套的阅读器 TiebaReader(https://github.com/Sorceresssis/TiebaReader)
使用python+copymanga API来下载copymanga(拷贝漫画)中的漫画,支持批量+选话下载和获取您收藏的漫画并下载!(全平台支持)
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...
Leaked GPTs Prompts Bypass the 25 message limit or to try out GPTs without a Plus subscription.
Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on dema...
Hydra九头龙,保姆级为您打造属于您的造跨平台TB-PB级别专属搜索引擎、专属上帝之眼。Hydra-面向云计算、多任务调度、服务通信、数仓、微服务化、抽象化分布式操作系统——以实现小型爬虫搜索引擎为例。
SpideyX a multipurpose Web Penetration Testing tool with asynchronous concurrent performance with multiple mode and configurations.
⚡ 一款用于自动语音识别 (ASR)、翻译的高性能异步 API。不需要购买Whisper API,使用本地运行的Whisper模型进行推理,并支持多GPU并发,针对分布式部署进行设计。还内置了包括TikTok、抖音等社交媒体平台的爬虫,可实现来自多个社交平台的无缝媒体处理,为媒体内容数据自动化处理提供了强大且可扩展的解决方案。
🤖 A curated list of websites that restrict access to AI Agents, AI crawlers and GPTs
ScrapeGPT is a RAG-based Telegram bot designed to scrape and analyze websites, then answer questions based on the scraped content. The bot utilizes Retrieval Augmented Generation and webscraping to re...
SecretScraper is a web scraper that crawl through target websites, scrape from http response and extract secret information via regular expression.
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...
👾 Fast and simple video download library and CLI tool written in Go
🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...
Scrapy, a fast high-level web crawling & scraping framework for Python.
一个基于✨HOOK机制的微信机器人,支持🌱安全新闻定时推送【FreeBuf,先知,安全客,奇安信攻防社区】,👯Kfc文案,⚡备案查询,⚡手机号归属地查询,⚡WHOIS信息查询,🎉星座查询,⚡天气查询,🌱摸鱼日历,⚡微步威胁情报查询, 🐛美女视频,⚡美女图片,👯帮助菜单。📫 支持积分功能,⚡支持自动拉人,⚡检测广告,🌱自动群发,👯Ai回复,😄自定义程度丰富,小白也可轻松上手!
Leaked GPTs Prompts Bypass the 25 message limit or to try out GPTs without a Plus subscription.
Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、微信读书、豆瓣、QQ等网站。(Some interesting examples of python crawlers that are friendly to beginners. )
📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.
Leaked GPTs Prompts Bypass the 25 message limit or to try out GPTs without a Plus subscription.
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on dema...
SiteOne Crawler is a cross-platform website crawler and analyzer for SEO, security, accessibility, and performance optimization—ideal for developers, DevOps, QA engineers, and consultants. Supports Wi...
Sasori is a dynamic web crawler powered by Puppeteer, designed for lightning-fast endpoint discovery.
SpideyX a multipurpose Web Penetration Testing tool with asynchronous concurrent performance with multiple mode and configurations.
一个基于✨HOOK机制的微信机器人,支持🌱安全新闻定时推送【FreeBuf,先知,安全客,奇安信攻防社区】,👯Kfc文案,⚡备案查询,⚡手机号归属地查询,⚡WHOIS信息查询,🎉星座查询,⚡天气查询,🌱摸鱼日历,⚡微步威胁情报查询, 🐛美女视频,⚡美女图片,👯帮助菜单。📫 支持积分功能,⚡支持自动拉人,⚡检测广告,🌱自动群发,👯Ai回复,😄自定义程度丰富,小白也可轻松上手!
SecretScraper is a web scraper that crawl through target websites, scrape from http response and extract secret information via regular expression.
In this Python Web Scraping Tutorial, we will outline everything needed to get started with web scraping. We will begin with simple examples and move on to relatively more complex.
🧩 / 🕸 WebsiteCrawler - This plugin automatically crawls the main content of a specified URL webpage and uses it as context input.
ScrapeGPT is a RAG-based Telegram bot designed to scrape and analyze websites, then answer questions based on the scraped content. The bot utilizes Retrieval Augmented Generation and webscraping to re...
Python API for JMComic | 提供Python API访问禁漫天堂,同时支持网页端和移动端 | 禁漫天堂GitHub Actions下载器🚀
Professional scrapers that provide full control to the users. Crawlee One builds on top of Crawlee and Apify and extends them with features for robust and highly configurable web scrapers.
Midnight Sea: navigating in the waters of dark web markets