29 results found Sort:

196
2.0k
bsd-3-clause
108
Polite, slim and concurrent web crawler.
Created 2012-09-19
220 commits to master branch, last one 3 years ago
advertools - online marketing productivity and analysis tools
Created 2017-05-14
1,369 commits to master branch, last one a day ago
95
782
bsd-3-clause
34
A simple and flexible web crawler that follows the robots.txt policies and crawl delays.
Created 2014-01-06
83 commits to master branch, last one 3 years ago
NuxtJS module for robots.txt
Created 2017-10-24
151 commits to main branch, last one 8 months ago
55
266
mit
10
The robots.txt exclusion protocol implementation for Go language
Created 2010-07-12
67 commits to master branch, last one about a year ago
A simple but powerful web crawler library for .NET
Created 2018-12-28
355 commits to main branch, last one 9 months ago
A set of reusable Java components that implement functionality common to any web crawler
Created 2015-04-09
629 commits to master branch, last one about a month ago
Determine if a page may be crawled from robots.txt, robots meta tags and robot headers
Created 2018-02-26
144 commits to main branch, last one about a month ago
1
192
other
3
Opt-Out tool to check Copyright reservations in a way that even machines can understand.
Created 2023-07-19
73 commits to main branch, last one 5 months ago
Ultimate Website Sitemap Parser
Created 2018-11-27
84 commits to develop branch, last one 3 years ago
15
147
gpl-3.0
4
Open-Source Python Based SEO Web Crawler
Created 2020-05-09
751 commits to master branch, last one 11 months ago
NodeJS robots.txt parser with support for wildcard (*) matching.
Created 2014-09-27
86 commits to master branch, last one 9 days ago
Gatsby plugin that automatically creates robots.txt for your site
Created 2018-04-22
405 commits to main branch, last one about a year ago
6
100
apache-2.0
4
grobotstxt is a native Go port of Google's robots.txt parser and matcher library.
Created 2020-04-21
70 commits to main branch, last one 2 years ago
Makes it easy to add robots.txt, sitemap and web app manifest during build to your Astro app.
Created 2022-04-26
377 commits to main branch, last one 9 months ago
Simple robots.txt template. Keep unwanted robots out (disallow). White lists (allow) legitimate user-agents. Useful for all websites.
Created 2016-01-08
303 commits to master branch, last one 4 months ago
Php class for robots.txt parse
Created 2013-01-21
366 commits to master branch, last one 2 years ago
🤖 A curated list of websites that restrict access to AI Agents, AI crawlers and GPTs
Created 2024-02-13
17 commits to main branch, last one 12 days ago
Known tags and settings suggested to opt out of having your content used for AI training.
Created 2023-02-05
29 commits to main branch, last one 8 days ago
Parser for robots.txt for node.js
Created 2011-08-04
97 commits to master branch, last one 5 years ago
Generator robots.txt for node js
Created 2014-11-07
173 commits to master branch, last one 4 years ago
4
64
agpl-3.0
3
Privacy Web Search Engine (not meta, own crawler)
Created 2021-12-14
103 commits to master branch, last one about a year ago
ScrapeGPT is a RAG-based Telegram bot designed to scrape and analyze websites, then answer questions based on the scraped content. The bot utilizes Retrieval Augmented Generation and webscraping to re...
Created 2024-02-13
22 commits to main branch, last one 4 months ago
13
61
unknown
2
Python-based web crawling script with randomized intervals, user-agent rotation, and proxy server IP rotation to outsmart website bots and prevent blocking.
Created 2020-07-12
38 commits to master branch, last one 8 months ago
Dark Web Informationgathering Footprinting Scanner and Recon Tool Release. Dark Web is an Information Gathering Tool I made in python 3. To run Dark Web, it only needs a domain or ip. Dark Web can wo...
Created 2022-10-06
10 commits to main branch, last one about a year ago
26
52
bsd-3-clause
9
A pure-Python robots.txt parser with support for modern conventions.
Created 2019-06-19
126 commits to master branch, last one about a month ago
List of useful links, tools and resources
Created 2020-06-18
95 commits to master branch, last one 2 years ago
An Astro project template for decent projects: auth, i18next, Bootstrap, sitemap, webworker, robots.txt, preact, react, endpoints, endpoint clients, OAuth, various Astro features and data loading prec...
Created 2023-02-14
16 commits to main branch, last one about a year ago
Enumerate old versions of robots.txt paths using Wayback Machine for content discovery
Created 2023-04-11
2 commits to main branch, last one 9 months ago