jiangnanboy / llm_corpus_quality

大模型预训练中文语料清洗及质量评估 Large model pre-training corpus cleaning

Date Created 2023-12-12 (about a year ago)
Commits 23 (last one 4 months ago)
Stargazers 43 (1 this week)
Watchers 1 (0 this week)
Forks 6
License unknown
Ranking

RepositoryStats indexes 595,856 repositories, of these jiangnanboy/llm_corpus_quality is ranked #505,506 (15th percentile) for total stargazers, and #544,643 for total watchers. Github reports the primary language for this repository as Java, for repositories using this language it is ranked #25,835/28,578.

jiangnanboy/llm_corpus_quality is also tagged with popular topics, for these it's ranked: java (#7,135/7759),  llm (#2,313/2913)

Other Information

jiangnanboy/llm_corpus_quality has Github issues enabled, there is 1 open issue and 0 closed issues.

Star History

Github stargazers over time

Watcher History

Github watchers over time, collection started in '23

Recent Commit History

23 commits on the default branch (master) since jan '22

Yearly Commits

Commits to the default branch (master) per year

Issue History

Languages

The only known language in this repository is Java

updated: 2024-12-21 @ 12:09pm, id: 730673895 / R_kgDOK40y5w