29 results found Sort:
- Filter by Primary Language:
- Python (8)
- Go (5)
- JavaScript (4)
- PHP (2)
- TypeScript (2)
- Java (1)
- HTML (1)
- C++ (1)
- CSS (1)
- DIGITAL Command Language (1)
- C# (1)
- +
Polite, slim and concurrent web crawler.
Created
2012-09-19
220 commits to master branch, last one 3 years ago
advertools - online marketing productivity and analysis tools
Created
2017-05-14
1,431 commits to master branch, last one 3 months ago
A simple and flexible web crawler that follows the robots.txt policies and crawl delays.
Created
2014-01-06
83 commits to master branch, last one 3 years ago
Tame the robots crawling and indexing your Nuxt site.
Created
2017-10-24
276 commits to main branch, last one 9 days ago
The robots.txt exclusion protocol implementation for Go language
Created
2010-07-12
67 commits to master branch, last one 2 years ago
A simple but powerful web crawler library for .NET
Created
2018-12-28
355 commits to main branch, last one about a year ago
A set of reusable Java components that implement functionality common to any web crawler
Created
2015-04-09
653 commits to master branch, last one 9 days ago
Determine if a page may be crawled from robots.txt, robots meta tags and robot headers
Created
2018-02-26
161 commits to main branch, last one about a month ago
Opt-Out tool to check Copyright reservations in a way that even machines can understand.
Created
2023-07-19
73 commits to main branch, last one 10 months ago
Ultimate Website Sitemap Parser
Created
2018-11-27
84 commits to develop branch, last one 4 years ago
Open-Source Python Based SEO Web Crawler
Created
2020-05-09
751 commits to master branch, last one about a year ago
NodeJS robots.txt parser with support for wildcard (*) matching.
Created
2014-09-27
90 commits to master branch, last one about a month ago
Known tags and settings suggested to opt out of having your content used for AI training.
Created
2023-02-05
29 commits to main branch, last one 5 months ago
Makes it easy to add robots.txt, sitemap and web app manifest during build to your Astro app.
Created
2022-04-26
377 commits to main branch, last one about a year ago
grobotstxt is a native Go port of Google's robots.txt parser and matcher library.
Created
2020-04-21
70 commits to main branch, last one 2 years ago
Gatsby plugin that automatically creates robots.txt for your site
Created
2018-04-22
405 commits to main branch, last one 2 years ago
Simple robots.txt template. Keep unwanted robots out (disallow). White lists (allow) legitimate user-agents. Useful for all websites.
Created
2016-01-08
303 commits to master branch, last one 9 months ago
Php class for robots.txt parse
Created
2013-01-21
366 commits to master branch, last one 2 years ago
🤖 A curated list of websites that restrict access to AI Agents, AI crawlers and GPTs
Created
2024-02-13
23 commits to main branch, last one about a month ago
ScrapeGPT is a RAG-based Telegram bot designed to scrape and analyze websites, then answer questions based on the scraped content. The bot utilizes Retrieval Augmented Generation and webscraping to re...
Created
2024-02-13
22 commits to main branch, last one 9 months ago
Python-based web crawling script with randomized intervals, user-agent rotation, and proxy server IP rotation to outsmart website bots and prevent blocking.
Created
2020-07-12
44 commits to master branch, last one 4 months ago
Privacy Web Search Engine (not meta, own crawler)
Created
2021-12-14
103 commits to master branch, last one about a year ago
Parser for robots.txt for node.js
Created
2011-08-04
97 commits to master branch, last one 6 years ago
Generator robots.txt for node js
Created
2014-11-07
173 commits to master branch, last one 4 years ago
Dark Web Informationgathering Footprinting Scanner and Recon Tool Release. Dark Web is an Information Gathering Tool I made in python 3. To run Dark Web, it only needs a domain or ip. Dark Web can wo...
Created
2022-10-06
10 commits to main branch, last one about a year ago
A pure-Python robots.txt parser with support for modern conventions.
Created
2019-06-19
134 commits to master branch, last one 18 days ago
An Astro project template for decent projects: auth, i18next, Bootstrap, sitemap, webworker, robots.txt, preact, react, endpoints, endpoint clients, OAuth, various Astro features and data loading prec...
Created
2023-02-14
16 commits to main branch, last one about a year ago
List of useful links, tools and resources
Created
2020-06-18
95 commits to master branch, last one 2 years ago
Enumerate old versions of robots.txt paths using Wayback Machine for content discovery
Created
2023-04-11
2 commits to main branch, last one about a year ago