NVIDIA / NeMo-Curator

Scalable data pre processing and curation toolkit for LLMs

Date Created 2024-03-14 (11 months ago)
Commits 289 (last one 2 days ago)
Stargazers 794 (9 this week)
Watchers 13 (0 this week)
Forks 110
License apache-2.0
Ranking

RepositoryStats indexes 618,350 repositories, of these NVIDIA/NeMo-Curator is ranked #65,172 (89th percentile) for total stargazers, and #168,220 for total watchers. Github reports the primary language for this repository as Jupyter Notebook, for repositories using this language it is ranked #1,400/18,432.

NVIDIA/NeMo-Curator is also tagged with popular topics, for these it's ranked: python (#3,539/22999),  llm (#589/3282),  large-language-models (#208/1166),  data (#177/1031),  fine-tuning (#46/207)

Other Information

NVIDIA/NeMo-Curator has 27 open pull requests on Github, 283 pull requests have been merged over the lifetime of the repository.

Github issues are enabled, there are 71 open issues and 107 closed issues.

There have been 11 releases, the latest one was published on 2025-01-07 (about a month ago) with the name NVIDIA NeMo Curator 0.6.0.

Star History

Github stargazers over time

80080070070060060050050040040030030020020010010000Apr '24Apr '24May '24May '24Jun '24Jun '24Jul '24Jul '24Aug '24Aug '24Sep '24Sep '24Oct '24Oct '24Nov '24Nov '24Dec '24Dec '2420252025Feb '25Feb '25

Watcher History

Github watchers over time, collection started in '23

161614141212101088664422Apr '24Apr '24May '24May '24Jun '24Jun '24Jul '24Jul '24Aug '24Aug '24Sep '24Sep '24Oct '24Oct '24Nov '24Nov '24Dec '24Dec '2420252025Feb '25Feb '25

Recent Commit History

289 commits on the default branch (main) since jan '22

300300250250200200150150100100505000Apr '24Apr '24May '24May '24Jun '24Jun '24Jul '24Jul '24Aug '24Aug '24Sep '24Sep '24Oct '24Oct '24Nov '24Nov '24Dec '24Dec '2420252025Feb '25Feb '25

Yearly Commits

Commits to the default branch (main) per year

25025020020015015010010050500020242024

Issue History

Total Issues
Open Issues
Closed Issues
180180160160140140120120100100808060604040202000Apr '24Apr '24May '24May '24Jun '24Jun '24Jul '24Jul '24Aug '24Aug '24Sep '24Sep '24Oct '24Oct '24Nov '24Nov '24Dec '24Dec '2420252025Feb '25Feb '25

Languages

The primary language is Jupyter Notebook but there's also others...

Jupyter NotebookJupyter NotebookPythonPythonShellShellDockerfileDockerfile

updated: 2025-02-21 @ 11:42pm, id: 772255271 / R_kgDOLgeuJw