Search Results - RepositoryStats

1.3k

23.7k

mit

176

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

Created 2017-05-05

4,643 commits to dev branch, last one about a month ago

conifer Rhizome-Conifer

122

1.5k

apache-2.0

51

Collect and revisit web pages.

pywb warc docker python wayback archives webrecorder web-archiving

Created 2015-05-13

1,965 commits to main branch, last one about a year ago

pywb webrecorder

228

1.5k

gpl-3.0

60

Core Python Web Archiving Toolkit for replay and recording of web archives

pywb python wayback web-archives web-archiving

Created 2013-12-09

2,320 commits to main branch, last one 10 hours ago

archiveweb.page webrecorder

69

990

agpl-3.0

20

A High-Fidelity Web Archiving Extension for Chrome and Chromium based browsers!

wacz warc chromium archiving extension webrecorder web-archiving browser-extension

Created 2020-02-10

248 commits to main branch, last one 3 months ago

web-archive Ray-D-Song

287

819

gpl-3.0

7

Free web archiving and sharing service based on Cloudflare. 跑在 Cloudflare 上的免费网页归档和分享工具。

d1 free hono cloudflare serverless self-hosted web-archive web-archiving cloudflare-pages

Created 2024-10-22

194 commits to main branch, last one 7 days ago

replayweb.page webrecorder

68

787

agpl-3.0

16

Serverless replay of web archives directly in the browser

wacz warc web-replay web-archive web-archiving service-worker replay-web-page wayback-machine

Created 2019-12-09

506 commits to main branch, last one about a month ago

single-file-cli gildas-lormeau

77

783

agpl-3.0

11

CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)

cli deno nodejs crawler archiving dockerfile single-file web-crawler web-scraper web-scraping web-archiving scraping-websites

Created 2022-05-31

766 commits to master branch, last one 24 days ago

browsertrix-crawler webrecorder

101

756

agpl-3.0

23

Run a high-fidelity browser-based web archiving crawler in a single Docker container

wacz warc crawler crawling web-crawler webrecorder web-archiving

Created 2020-11-02

514 commits to main branch, last one 14 days ago

auto-archiver bellingcat

75

690

mit

23

Automatically archive links to videos, images, and social media content from Google Sheets (and more).

docker python archive service scraping web-archiving open-source-research

Created 2021-01-15

1,293 commits to main branch, last one 17 days ago

ipwb oduwsdl

41

630

mit

22

InterPlanetary Wayback: A distributed and persistent archive replay system using IPFS

ipfs warc docker python memento wayback memento-rfc web-archiving service-worker

Created 2016-03-04

1,620 commits to master branch, last one 2 months ago

waybackpy akamhy

35

522

mit

10

Wayback Machine API interface & a command-line tool

osint cdx-api savepagenow webarchiving web-archiving archive-webpage wayback-machine archive-webpages internet-archive internet-archiving wayback-machine-api wayback-machine-python

Created 2020-05-02

497 commits to master branch, last one 2 years ago

perma harvard-lil

76

467

unknown

25

Indelible links

libraries web-archiving

Created 2013-05-06

9,136 commits to develop branch, last one 8 days ago

archivenow oduwsdl

41

419

mit

20

A Tool To Push Web Resources Into Web Archives

web-archiving internet-archive

Created 2017-02-09

186 commits to master branch, last one about a year ago

warcio webrecorder

61

410

apache-2.0

21

Streaming WARC/ARC library for fast web archive IO

pywb warc python web-archives web-archiving

Created 2017-03-06

157 commits to master branch, last one 4 months ago

WarcDB Florents-Tselai

11

397

apache-2.0

9

WarcDB: Web crawl data as SQLite databases.

cli warc sqlite crawling database web-data web-archiving

Created 2022-05-29

73 commits to main branch, last one 9 months ago

wail machawk1

37

373

mit

13

:whale2: Web Archiving Integration Layer: One-Click User Instigated Preservation

gui warc python wayback heritrix openwayback pyinstaller web-archiving

Created 2013-03-20

864 commits to main branch, last one about a month ago

archivebox-browser-extension ArchiveBox

30

308

mit

9

Official ArchiveBox browser extension: automatically/manually preserve your browsing history using ArchiveBox.

svelte digipres archiving archivebox web-archiving chrome-extension browser-extension firefox-extension internet-archiving digital-preservation

Created 2021-06-30

154 commits to master branch, last one about a month ago

browsertrix webrecorder

49

265

agpl-3.0

11

Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!

wacz warc cloud archiving kubernetes web-archive webrecorder web-archiving

Created 2021-06-28

1,605 commits to main branch, last one 14 hours ago

warcreate machawk1

14

220

mit

16

Chrome extension to "Create WARC files from any webpage"

warc web-archiving chrome-extension

Created 2013-03-20

181 commits to main branch, last one about a year ago

electron-archivebox ArchiveBox

15

179

gpl-3.0

7

Desktop Electron app for ArchiveBox internet archiver. (ALPHA: not ready for general use)

gui linux macos docker desktop windows digipres electron archivebox web-archiving desktop-electron internet-archiving

Created 2020-11-23

58 commits to main branch, last one 2 years ago

cdx_toolkit cocrawler

31

169

apache-2.0

10

A toolkit for CDX indices such as Common Crawl and the Internet Archive's Wayback Machine

cdx warc python cdx-api commoncrawl web-archives web-archiving

Created 2018-03-03

259 commits to main branch, last one 7 months ago

sfm-ui gwu-libraries

25

155

mit

27

Social Feed Manager user interface application.

code4lib social-media web-archiving social-feed-manager

Created 2015-07-27

880 commits to master branch, last one about a year ago

ArchiveSpark helgeho

19

150

mit

15

An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.

warc spark webarchive archivespark web-archiving spark-framework internet-archive

Created 2015-08-06

148 commits to master branch, last one 15 days ago

ph-submissions programminghistorian

115

143

unknown

46

The repository and website hosting the peer review process for new Programming Historian lessons

dh api python mapping pedagogy r-studio open-source web-scraping multi-lingual web-archiving data-management digital-history distant-reading linked-open-data network-analysis digital-humanities programming-historian open-educational-resources

Created 2016-01-15

9,437 commits to gh-pages branch, last one 14 hours ago

fatcat internetarchive

18

120

other

16

Perpetual Access To The Scholarly Record

rust python postgresql open-access web-archiving digital-library scholarly-communication

Created 2018-09-06

3,268 commits to master branch, last one about a year ago

warc-parquet maxcountryman

0

110

mit

4

🗄️ A simple CLI for converting WARC to Parquet.

warc duckdb parquet crawling web-archiving

Created 2022-06-20

72 commits to main branch, last one 5 months ago

node-warc N0taN3rd

22

98

mit

8

Parse And Create Web ARChive (WARC) files with node.js

warc pupeteer warc-files webarchive web-archives webarchiving web-archiving chrome-remote-interface

Created 2017-05-21

116 commits to master branch, last one 2 months ago

warrick oduwsdl

10

89

unknown

8

Recover lost websites from the Web Infrastructure

memento recovery memento-rfc web-archiving

Created 2015-02-18

17 commits to master branch, last one 4 years ago

Collect xarantolus

13

86

mit

5

A server to collect & archive websites that also supports video downloads

archive self-hosted webinterface web-archiving website-archive website-scraper video-downloader

Created 2018-01-03

371 commits to master branch, last one 3 years ago

hoardy-web Own-Data-Privateer

7

76

gpl-3.0

2

Passively capture, archive, and hoard your web browsing history, including the contents of the pages you visit, for later offline viewing, replay, mirroring, data scraping, and/or indexing. Your own p...

cli archive backups archiver internet snapshot archiving auto-save self-hosted web-archive web-browsing web-archiving offline-reading wayback-machine website-archive browser-extension internet-archiving

Created 2023-08-20

1,254 commits to master branch, last one 21 days ago