Trending repositories for topic web-crawler
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...
Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
Internet search engine for text-oriented websites. Indexing the small, old and weird web.
A collection of awesome web crawler,spider in different languages
CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)
Cross Platform C# web crawler framework built for speed and flexibility. Please star this project! +1.
Internet search engine for text-oriented websites. Indexing the small, old and weird web.
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)
Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...
A collection of awesome web crawler,spider in different languages
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
Cross Platform C# web crawler framework built for speed and flexibility. Please star this project! +1.
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...
Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
A collection of awesome web crawler,spider in different languages
Internet search engine for text-oriented websites. Indexing the small, old and weird web.
CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)
Run a high-fidelity browser-based web archiving crawler in a single Docker container
GroqCrawl is a powerful and user-friendly web crawling and scraping application built with Streamlit and powered by PocketGroq. It provides an intuitive interface for extracting LLM friendly AI consum...
PulsarRPA Pro Edition: Empower Your Workflows with AI-Driven Web Data Extraction.
Undetected Web-Scraping & Seamless HTML Parsing in Python!
Automate webpages at scale, scrape web data completely and accurately with high performance, distributed AI-RPA.
Cross Platform C# web crawler framework built for speed and flexibility. Please star this project! +1.
🔥 This repository contains complete application examples, including websites and other projects, developed using Firecrawl.
GroqCrawl is a powerful and user-friendly web crawling and scraping application built with Streamlit and powered by PocketGroq. It provides an intuitive interface for extracting LLM friendly AI consum...
🔥 This repository contains complete application examples, including websites and other projects, developed using Firecrawl.
PulsarRPA Pro Edition: Empower Your Workflows with AI-Driven Web Data Extraction.
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...
Undetected Web-Scraping & Seamless HTML Parsing in Python!
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)
Internet search engine for text-oriented websites. Indexing the small, old and weird web.
Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks
Run a high-fidelity browser-based web archiving crawler in a single Docker container
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...
Automate webpages at scale, scrape web data completely and accurately with high performance, distributed AI-RPA.
A collection of awesome web crawler,spider in different languages
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...
Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
A collection of awesome web crawler,spider in different languages
CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)
Run a high-fidelity browser-based web archiving crawler in a single Docker container
Internet search engine for text-oriented websites. Indexing the small, old and weird web.
GroqCrawl is a powerful and user-friendly web crawling and scraping application built with Streamlit and powered by PocketGroq. It provides an intuitive interface for extracting LLM friendly AI consum...
Undetected Web-Scraping & Seamless HTML Parsing in Python!
Automate webpages at scale, scrape web data completely and accurately with high performance, distributed AI-RPA.
Cross Platform C# web crawler framework built for speed and flexibility. Please star this project! +1.
GroqCrawl is a powerful and user-friendly web crawling and scraping application built with Streamlit and powered by PocketGroq. It provides an intuitive interface for extracting LLM friendly AI consum...
🔥 This repository contains complete application examples, including websites and other projects, developed using Firecrawl.
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...
Undetected Web-Scraping & Seamless HTML Parsing in Python!
Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
PulsarRPA Pro Edition: Empower Your Workflows with AI-Driven Web Data Extraction.
Python script, which empowers people with no programming background to generate robust leads on a mass scale. This repo will be compiled of various versatile techniques used in lead generation.
CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)
Run a high-fidelity browser-based web archiving crawler in a single Docker container
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...
A Machine Learning Project implemented from scratch which involves web scraping, data engineering, exploratory data analysis and machine learning to predict housing prices in New York Tri-State Area.
Internet search engine for text-oriented websites. Indexing the small, old and weird web.
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...
GroqCrawl is a powerful and user-friendly web crawling and scraping application built with Streamlit and powered by PocketGroq. It provides an intuitive interface for extracting LLM friendly AI consum...
🔥 This repository contains complete application examples, including websites and other projects, developed using Firecrawl.
"Enhancing LLM Factual Accuracy with RAG to Counter Hallucinations: A Case Study on Domain-Specific Queries in Private Knowledge-Bases" by Jiarui Li and Ye Yuan and Zehua Zhang
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...
Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架
A collection of awesome web crawler,spider in different languages
Internet search engine for text-oriented websites. Indexing the small, old and weird web.
CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)
Automate webpages at scale, scrape web data completely and accurately with high performance, distributed AI-RPA.
Run a high-fidelity browser-based web archiving crawler in a single Docker container
Undetected Web-Scraping & Seamless HTML Parsing in Python!
Cross Platform C# web crawler framework built for speed and flexibility. Please star this project! +1.
News crawling with StormCrawler - stores content as WARC
Undetected Web-Scraping & Seamless HTML Parsing in Python!
GroqCrawl is a powerful and user-friendly web crawling and scraping application built with Streamlit and powered by PocketGroq. It provides an intuitive interface for extracting LLM friendly AI consum...
"Enhancing LLM Factual Accuracy with RAG to Counter Hallucinations: A Case Study on Domain-Specific Queries in Private Knowledge-Bases" by Jiarui Li and Ye Yuan and Zehua Zhang
Python script, which empowers people with no programming background to generate robust leads on a mass scale. This repo will be compiled of various versatile techniques used in lead generation.
GoScrapy: Harnessing Go's power for blazingly fast web scraping, inspired by Python's Scrapy framework.
CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)
PulsarRPA Pro Edition: Empower Your Workflows with AI-Driven Web Data Extraction.
Automate webpages at scale, scrape web data completely and accurately with high performance, distributed AI-RPA.
Internet search engine for text-oriented websites. Indexing the small, old and weird web.
Multi-threaded web scraper to download all the tutorials from www.learncpp.com and convert them to PDF files concurrently.
A Machine Learning Project implemented from scratch which involves web scraping, data engineering, exploratory data analysis and machine learning to predict housing prices in New York Tri-State Area.
Run a high-fidelity browser-based web archiving crawler in a single Docker container
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and o...