centic9 / CommonCrawlDocumentDownload

A small tool which uses the CommonCrawl URL Index to download documents with certain file types or mime-types. This is used for mass-testing of frameworks like Apache POI and Apache Tika

Date Created 2015-04-22 (9 years ago)
Commits 286 (last one 15 days ago)
Stargazers 59 (0 this week)
Watchers 13 (0 this week)
Forks 20
License bsd-2-clause
Ranking

RepositoryStats indexes 534,880 repositories, of these centic9/CommonCrawlDocumentDownload is ranked #387,298 (28th percentile) for total stargazers, and #161,173 for total watchers. Github reports the primary language for this repository as Java, for repositories using this language it is ranked #21,139/26,728.

centic9/CommonCrawlDocumentDownload is also tagged with popular topics, for these it's ranked: java (#5,977/7316)

Other Information

There have been 6 releases, the latest one was published on 2023-01-15 (about a year ago) with the name 1.0.0.10.

Star History

Github stargazers over time

Watcher History

Github watchers over time, collection started in '23

Recent Commit History

79 commits on the default branch (master) since jan '22

Yearly Commits

Commits to the default branch (master) per year

Issue History

Languages

The primary language is Java but there's also others...

updated: 2024-06-17 @ 07:20am, id: 34407138 / R_kgDOAg0C4g