bigscience-workshop / data-preparation

Code used for sourcing and cleaning the BigScience ROOTS corpus

Date Created 2022-04-25 (2 years ago)
Commits 50 (last one about a year ago)
Stargazers 307 (0 this week)
Watchers 24 (0 this week)
Forks 40
License apache-2.0
Ranking

RepositoryStats indexes 609,829 repositories, of these bigscience-workshop/data-preparation is ranked #134,381 (78th percentile) for total stargazers, and #92,771 for total watchers. Github reports the primary language for this repository as Jupyter Notebook, for repositories using this language it is ranked #2,996/18,117.

bigscience-workshop/data-preparation is also tagged with popular topics, for these it's ranked: dataset (#256/1191),  large-language-models (#381/1131)

Other Information

bigscience-workshop/data-preparation has Github issues enabled, there are 9 open issues and 3 closed issues.

Homepage URL: https://bigscience.huggingface.co/

Star History

Github stargazers over time

Watcher History

Github watchers over time, collection started in '23

Recent Commit History

50 commits on the default branch (main) since jan '22

Yearly Commits

Commits to the default branch (main) per year

Issue History

Languages

The primary language is Jupyter Notebook but there's also others...

updated: 2025-01-24 @ 07:39am, id: 485318608 / R_kgDOHO1f0A