9 results found Sort:

An Awesome List for getting started with web archiving
Created 2017-06-16
149 commits to main branch, last one a day ago
34
499
mit
10
Wayback Machine API interface & a command-line tool
Created 2020-05-02
497 commits to master branch, last one 2 years ago
WARC + AI - Experimental Retrieval Augmented Generation Pipeline for Web Archive Collections.
Created 2023-10-23
213 commits to main branch, last one 21 days ago
15
109
gpl-3.0
20
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
Created 2019-09-07
4,429 commits to v1.21.3-at branch, last one 2 months ago
Parse And Create Web ARChive (WARC) files with node.js
Created 2017-05-21
116 commits to master branch, last one a day ago
A list of things related to software, literature, and other content for 🕣 Memento
Created 2016-09-16
64 commits to main branch, last one 8 months ago
9
57
gpl-3.0
6
A dockerized, queued high fidelity web archiver based on Squidwarc
Created 2018-07-21
34 commits to master branch, last one 6 months ago
9
49
apache-2.0
18
Various Jupyter notebooks about Common Crawl data
Created 2019-07-19
23 commits to main branch, last one 2 years ago
Quick Cache and Archive search buttons
Created 2021-07-10
77 commits to main branch, last one 8 months ago