commoncrawl / cc-pyspark

Process Common Crawl data with Python and Spark

Date Created 2017-04-12 (7 years ago)
Commits 115 (last one 23 days ago)
Stargazers 422 (1 this week)
Watchers 20 (0 this week)
Forks 88
License mit
Ranking

RepositoryStats indexes 623,448 repositories, of these commoncrawl/cc-pyspark is ranked #106,808 (83rd percentile) for total stargazers, and #112,153 for total watchers. Github reports the primary language for this repository as Python, for repositories using this language it is ranked #18,175/126,565.

commoncrawl/cc-pyspark is also tagged with popular topics, for these it's ranked: spark (#169/549),  pyspark (#31/115)

Other Information

commoncrawl/cc-pyspark has 2 open pull requests on Github, 12 pull requests have been merged over the lifetime of the repository.

Github issues are enabled, there are 3 open issues and 23 closed issues.

Star History

Github stargazers over time

4504504004003503503003002502502002001501501001005050002018201820192019202020202021202120222022202320232024202420252025

Watcher History

Github watchers over time, collection started in '23

2424232322222121202019191818171716161515141420232023Feb '23Feb '23Apr '23Apr '23Jun '23Jun '23Aug '23Aug '23Oct '23Oct '23Dec '23Dec '23Feb '24Feb '24Apr '24Apr '24Jun '24Jun '24Aug '24Aug '24Oct '24Oct '24Dec '24Dec '24Feb '25Feb '25

Recent Commit History

38 commits on the default branch (main) since jan '22

40403535303025252020151510105500Jul '22Jul '2220232023Jul '23Jul '2320242024Jul '24Jul '2420252025

Yearly Commits

Commits to the default branch (main) per year

2525202015151010550020172017201820182019201920202020202120212022202220242024

Issue History

Total Issues
Open Issues
Closed Issues
30302525202015151010550020192019202020202021202120222022202320232024202420252025

Languages

The primary language is Python but there's also others...

PythonPythonShellShell

updated: 2025-03-03 @ 07:19pm, id: 88059195 / R_kgDOBT-tOw