Trending repositories for topic crawler
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。
🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...
小红书数据采集、网站图片、视频资源批量下载工具,颜值超高的数据采集工具(批量下载,视频提取,图片,去水印等)Telegram:https://t.me/+ZtLSwuIKTo44MDY1
一个基于✨HOOK机制的微信机器人,支持🌱安全新闻定时推送【FreeBuf,先知,安全客,奇安信攻防社区】,👯Kfc文案,⚡备案查询,⚡手机号归属地查询,⚡WHOIS信息查询,🎉星座查询,⚡天气查询,🌱摸鱼日历,⚡微步威胁情报查询, 🐛美女视频,⚡美女图片,👯帮助菜单。📫 支持积分功能,⚡支持自动拉人,⚡检测广告,🌱自动群发,👯Ai回复,😄自定义程度丰富,小白也可轻松上手!
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on dema...
High-performance asynchronous Douyin(抖音) TikTok Xiaohongshu(小红书) Kuaishou(快手) Weibo(微博) Instagram YouTube(油管) Twitter(X) Captcha Solver(验证码解决器) Temp Mail(临时邮箱) API(接口).
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
Analysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprinting scripts 🕵️♂️ when scraping the web?
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on dema...
High-performance asynchronous Douyin(抖音) TikTok Xiaohongshu(小红书) Kuaishou(快手) Weibo(微博) Instagram YouTube(油管) Twitter(X) Captcha Solver(验证码解决器) Temp Mail(临时邮箱) API(接口).
小红书数据采集、网站图片、视频资源批量下载工具,颜值超高的数据采集工具(批量下载,视频提取,图片,去水印等)Telegram:https://t.me/+ZtLSwuIKTo44MDY1
Hydra九头龙,保姆级为您打造属于您的造跨平台TB-PB级别专属搜索引擎、专属上帝之眼。Hydra-面向云计算、多任务调度、服务通信、数仓、微服务化、抽象化分布式操作系统——以实现小型爬虫搜索引擎为例。
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
SiteOne Crawler is a website analyzer and exporter you'll ♥ as a Dev/DevOps, QA engineer, website owner or consultant. Works on all popular platforms - Windows, macOS and Linux (x64 and arm64 too).
Sasori is a dynamic web crawler powered by Puppeteer, designed for lightning-fast endpoint discovery.
一个基于✨HOOK机制的微信机器人,支持🌱安全新闻定时推送【FreeBuf,先知,安全客,奇安信攻防社区】,👯Kfc文案,⚡备案查询,⚡手机号归属地查询,⚡WHOIS信息查询,🎉星座查询,⚡天气查询,🌱摸鱼日历,⚡微步威胁情报查询, 🐛美女视频,⚡美女图片,👯帮助菜单。📫 支持积分功能,⚡支持自动拉人,⚡检测广告,🌱自动群发,👯Ai回复,😄自定义程度丰富,小白也可轻松上手!
使用python编译exe/bash/命令行参数来下载copymanga(拷贝漫画)中的漫画,支持批量+选话下载和获取您收藏的漫画并下载!(windows&linux支持,MacOS代码支持)
📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...
🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...
Scrapy, a fast high-level web crawling & scraping framework for Python.
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
一个基于✨HOOK机制的微信机器人,支持🌱安全新闻定时推送【FreeBuf,先知,安全客,奇安信攻防社区】,👯Kfc文案,⚡备案查询,⚡手机号归属地查询,⚡WHOIS信息查询,🎉星座查询,⚡天气查询,🌱摸鱼日历,⚡微步威胁情报查询, 🐛美女视频,⚡美女图片,👯帮助菜单。📫 支持积分功能,⚡支持自动拉人,⚡检测广告,🌱自动群发,👯Ai回复,😄自定义程度丰富,小白也可轻松上手!
小红书数据采集、网站图片、视频资源批量下载工具,颜值超高的数据采集工具(批量下载,视频提取,图片,去水印等)Telegram:https://t.me/+ZtLSwuIKTo44MDY1
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on dema...
一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、微信读书、豆瓣、QQ等网站。(Some interesting examples of python crawlers that are friendly to beginners. )
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on dema...
Hydra九头龙,保姆级为您打造属于您的造跨平台TB-PB级别专属搜索引擎、专属上帝之眼。Hydra-面向云计算、多任务调度、服务通信、数仓、微服务化、抽象化分布式操作系统——以实现小型爬虫搜索引擎为例。
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
SiteOne Crawler is a website analyzer and exporter you'll ♥ as a Dev/DevOps, QA engineer, website owner or consultant. Works on all popular platforms - Windows, macOS and Linux (x64 and arm64 too).
小红书数据采集、网站图片、视频资源批量下载工具,颜值超高的数据采集工具(批量下载,视频提取,图片,去水印等)Telegram:https://t.me/+ZtLSwuIKTo44MDY1
High-performance asynchronous Douyin(抖音) TikTok Xiaohongshu(小红书) Kuaishou(快手) Weibo(微博) Instagram YouTube(油管) Twitter(X) Captcha Solver(验证码解决器) Temp Mail(临时邮箱) API(接口).
分布式爬虫项目,本项目支持个性化定制页面解析器二次开发,项目整体采用微服务架构,通过消息队列实现消息的异步发送,使用到的框架包括:redigo, gorm, goquery, easyjson, viper, amqp, zap, go-micro,并通过Docker实现容器化部署,中间爬虫节点支持水平拓展。
Spiderbuf 是一个python爬虫学习及练习网站: 保姆式引导关卡 + 免费在线视频教程,从Python环境的搭建到最简单的网页爬取,让零基础的小白也能获得成就感。 在已经入门的基础上强化练习,在矛与盾的攻防中不断提高技术水平,通过大量的模仿练习掌握常见的爬与反爬套路。 以闯关的形式挑战各个关卡任务,验证自身实力的时候到了。
📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.
Python API for JMComic | 提供Python API访问禁漫天堂,同时支持网页端和移动端 | 禁漫天堂GitHub Actions下载器🚀
A Python crawler tool that can automatically simulate browser operations to crawl all users' tweet content and save all static resources (videos, pictures) locally without calling the Twitter API. Use...
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...
🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。
Scrapy, a fast high-level web crawling & scraping framework for Python.
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on dema...
newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
一个基于✨HOOK机制的微信机器人,支持🌱安全新闻定时推送【FreeBuf,先知,安全客,奇安信攻防社区】,👯Kfc文案,⚡备案查询,⚡手机号归属地查询,⚡WHOIS信息查询,🎉星座查询,⚡天气查询,🌱摸鱼日历,⚡微步威胁情报查询, 🐛美女视频,⚡美女图片,👯帮助菜单。📫 支持积分功能,⚡支持自动拉人,⚡检测广告,🌱自动群发,👯Ai回复,😄自定义程度丰富,小白也可轻松上手!
Python API for JMComic | 提供Python API访问禁漫天堂,同时支持网页端和移动端 | 禁漫天堂GitHub Actions下载器🚀
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on dema...
SiteOne Crawler is a website analyzer and exporter you'll ♥ as a Dev/DevOps, QA engineer, website owner or consultant. Works on all popular platforms - Windows, macOS and Linux (x64 and arm64 too).
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Hydra九头龙,保姆级为您打造属于您的造跨平台TB-PB级别专属搜索引擎、专属上帝之眼。Hydra-面向云计算、多任务调度、服务通信、数仓、微服务化、抽象化分布式操作系统——以实现小型爬虫搜索引擎为例。
High-performance asynchronous Douyin(抖音) TikTok Xiaohongshu(小红书) Kuaishou(快手) Weibo(微博) Instagram YouTube(油管) Twitter(X) Captcha Solver(验证码解决器) Temp Mail(临时邮箱) API(接口).
Python API for JMComic | 提供Python API访问禁漫天堂,同时支持网页端和移动端 | 禁漫天堂GitHub Actions下载器🚀
Spiderbuf 是一个python爬虫学习及练习网站: 保姆式引导关卡 + 免费在线视频教程,从Python环境的搭建到最简单的网页爬取,让零基础的小白也能获得成就感。 在已经入门的基础上强化练习,在矛与盾的攻防中不断提高技术水平,通过大量的模仿练习掌握常见的爬与反爬套路。 以闯关的形式挑战各个关卡任务,验证自身实力的时候到了。
小红书数据采集、网站图片、视频资源批量下载工具,颜值超高的数据采集工具(批量下载,视频提取,图片,去水印等)Telegram:https://t.me/+ZtLSwuIKTo44MDY1
📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.
Web scraper with a simple REST API living in Docker and using a Headless browser and Readability.js for parsing.
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...
📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.
Prying Deep - An OSINT tool to collect intelligence on the dark web.
An advanced [Finder | Checker | Server] tool for proxy servers, supporting both HTTP(S) and SOCKS protocols. 🎭
SiteOne Crawler is a website analyzer and exporter you'll ♥ as a Dev/DevOps, QA engineer, website owner or consultant. Works on all popular platforms - Windows, macOS and Linux (x64 and arm64 too).
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on dema...
Hydra九头龙,保姆级为您打造属于您的造跨平台TB-PB级别专属搜索引擎、专属上帝之眼。Hydra-面向云计算、多任务调度、服务通信、数仓、微服务化、抽象化分布式操作系统——以实现小型爬虫搜索引擎为例。
Sasori is a dynamic web crawler powered by Puppeteer, designed for lightning-fast endpoint discovery.
🤖Free ChatGPT Line Bot with Horoscope, Music Broadcast, Google Image Search...
🤖 A curated list of websites that restrict access to AI Agents, AI crawlers and GPTs
ScrapeGPT is a RAG-based Telegram bot designed to scrape and analyze websites, then answer questions based on the scraped content. The bot utilizes Retrieval Augmented Generation and webscraping to re...
A Python crawler tool that can automatically simulate browser operations to crawl all users' tweet content and save all static resources (videos, pictures) locally without calling the Twitter API. Use...
A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...
🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。
👾 Fast and simple video download library and CLI tool written in Go
Scrapy, a fast high-level web crawling & scraping framework for Python.
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...
一个基于✨HOOK机制的微信机器人,支持🌱安全新闻定时推送【FreeBuf,先知,安全客,奇安信攻防社区】,👯Kfc文案,⚡备案查询,⚡手机号归属地查询,⚡WHOIS信息查询,🎉星座查询,⚡天气查询,🌱摸鱼日历,⚡微步威胁情报查询, 🐛美女视频,⚡美女图片,👯帮助菜单。📫 支持积分功能,⚡支持自动拉人,⚡检测广告,🌱自动群发,👯Ai回复,😄自定义程度丰富,小白也可轻松上手!
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、微信读书、豆瓣、QQ等网站。(Some interesting examples of python crawlers that are friendly to beginners. )
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
Collection of patches for puppeteer and playwright to avoid automation detection and leaks. Helps to avoid Cloudflare and DataDome CAPTCHA pages. Easy to patch/unpatch, can be enabled/disabled on dema...
Sasori is a dynamic web crawler powered by Puppeteer, designed for lightning-fast endpoint discovery.
🤖Free ChatGPT Line Bot with Horoscope, Music Broadcast, Google Image Search...
lianjia / beike estate crawler/analysis 2024
🧩 / 🕸 WebsiteCrawler - This plugin automatically crawls the main content of a specified URL webpage and uses it as context input.
小红书数据采集、网站图片、视频资源批量下载工具,颜值超高的数据采集工具(批量下载,视频提取,图片,去水印等)Telegram:https://t.me/+ZtLSwuIKTo44MDY1
In this Python Web Scraping Tutorial, we will outline everything needed to get started with web scraping. We will begin with simple examples and move on to relatively more complex.
一个基于✨HOOK机制的微信机器人,支持🌱安全新闻定时推送【FreeBuf,先知,安全客,奇安信攻防社区】,👯Kfc文案,⚡备案查询,⚡手机号归属地查询,⚡WHOIS信息查询,🎉星座查询,⚡天气查询,🌱摸鱼日历,⚡微步威胁情报查询, 🐛美女视频,⚡美女图片,👯帮助菜单。📫 支持积分功能,⚡支持自动拉人,⚡检测广告,🌱自动群发,👯Ai回复,😄自定义程度丰富,小白也可轻松上手!
SecretScraper is a web scraper that crawl through target websites, scrape from http response and extract secret information via regular expression.
Python API for JMComic | 提供Python API访问禁漫天堂,同时支持网页端和移动端 | 禁漫天堂GitHub Actions下载器🚀
爬虫逆向案例,已完成:TLS指纹|瑞数|震坤行 | 网易易盾 | 微信小程序反编译逆向(百达星系) | 同花顺 | rpc解密 | 加速乐 | 极验滑块验证码 | 巨量算数 | Boss直聘 | 企查查 | 中国五矿 | qq音乐 | 产业政策大数据平台 | 企知道 | 雪球网(acw_sc__v2) | 1688 | 七麦数据 | whggzy | 企名科技 | mohurd | 艺恩数据 | ...
A web crawling framework implemented in Golang, it is simple to write and delivers powerful performance. It comes with a wide range of practical middleware and supports various parsing and storage met...
Web scraper with a simple REST API living in Docker and using a Headless browser and Readability.js for parsing.
Jie stands out as a comprehensive security assessment and exploitation tool meticulously crafted for web applications. Its robust suite of features encompasses vulnerability scanning, information gath...
ScrapeGPT is a RAG-based Telegram bot designed to scrape and analyze websites, then answer questions based on the scraped content. The bot utilizes Retrieval Augmented Generation and webscraping to re...