huggingface / tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Date Created 2019-11-01 (5 years ago)
Commits 1,860 (last one 10 days ago)
Stargazers 9,539 (29 this week)
Watchers 121 (0 this week)
Forks 870
License apache-2.0
Ranking

RepositoryStats indexes 632,869 repositories, of these huggingface/tokenizers is ranked #4,479 (99th percentile) for total stargazers, and #14,048 for total watchers. Github reports the primary language for this repository as Rust, for repositories using this language it is ranked #168/18,447.

huggingface/tokenizers is also tagged with popular topics, for these it's ranked: nlp (#44/2520),  natural-language-processing (#38/1472),  gpt (#41/1235),  transformers (#18/900),  bert (#10/578),  language-model (#21/544),  natural-language-understanding (#2/125)

Other Information

huggingface/tokenizers has 17 open pull requests on Github, 530 pull requests have been merged over the lifetime of the repository.

Github issues are enabled, there are 65 open issues and 978 closed issues.

There have been 94 releases, the latest one was published on 2025-03-13 (16 days ago) with the name v0.21.1.

Homepage URL: https://huggingface.co/docs/tokenizers

Star History

Github stargazers over time

10k10k9k9k8k8k7k7k6k6k5k5k4k4k3k3k2k2k1k1k00202020202021202120222022202320232024202420252025

Watcher History

Github watchers over time, collection started in '23

12512512012011511511011010510510010020232023Jul '23Jul '2320242024Jul '24Jul '2420252025

Recent Commit History

356 commits on the default branch (main) since jan '22

400400350350300300250250200200150150100100505000Jul '22Jul '2220232023Jul '23Jul '2320242024Jul '24Jul '2420252025

Yearly Commits

Commits to the default branch (main) per year

1.4k1.4k1.2k1.2k1k1k800800600600400400200200002019201920202020202120212022202220242024

Issue History

Total Issues
Open Issues
Closed Issues
1.2k1.2k1k1k80080060060040040020020000202020202021202120222022202320232024202420252025

Languages

The primary language is Rust but there's also others...

RustRustPythonPythonJupyter NotebookJupyter NotebookTypeScriptTypeScriptJavaScriptJavaScriptCSSCSSMakefileMakefileHTMLHTML
Opengraph Image
huggingface/tokenizers

updated: 2025-03-29 @ 10:14am, id: 219035799 / R_kgDODQ44lw