nanowell / Differential-Transformer-PyTorch
PyTorch implementation of the Differential-Transformer architecture for sequence modeling, specifically tailored as a decoder-only model similar to large language models (LLMs). The architecture incorporates a novel Differential Attention mechanism, Multi-Head structure, RMSNorm, and SwiGLU.
RepositoryStats indexes 585,332 repositories, of these nanowell/Differential-Transformer-PyTorch is ranked #498,403 (15th percentile) for total stargazers, and #373,585 for total watchers. Github reports the primary language for this repository as Python, for repositories using this language it is ranked #96,202/116,570.
nanowell/Differential-Transformer-PyTorch has Github issues enabled, there is 1 open issue and 1 closed issue.
Star History
Github stargazers over time
Watcher History
Github watchers over time, collection started in '23
Recent Commit History
7 commits on the default branch (main) since jan '22
Yearly Commits
Commits to the default branch (main) per year
Issue History
Languages
The only known language in this repository is Python
updated: 2024-11-23 @ 01:53am, id: 869556248 / R_kgDOM9RgGA