nanowell / Differential-Transformer-PyTorch

PyTorch implementation of the Differential-Transformer architecture for sequence modeling, specifically tailored as a decoder-only model similar to large language models (LLMs). The architecture incorporates a novel Differential Attention mechanism, Multi-Head structure, RMSNorm, and SwiGLU.

Date Created 2024-10-08 (about a month ago)
Commits 7 (last one 26 days ago)
Stargazers 44 (5 this week)
Watchers 4 (0 this week)
Forks 5
License mit
Ranking

RepositoryStats indexes 585,332 repositories, of these nanowell/Differential-Transformer-PyTorch is ranked #498,403 (15th percentile) for total stargazers, and #373,585 for total watchers. Github reports the primary language for this repository as Python, for repositories using this language it is ranked #96,202/116,570.

nanowell/Differential-Transformer-PyTorch is also tagged with popular topics, for these it's ranked: machine-learning (#7,080/7939),  pytorch (#5,248/5954),  large-language-models (#856/1049)

Other Information

nanowell/Differential-Transformer-PyTorch has Github issues enabled, there is 1 open issue and 1 closed issue.

Star History

Github stargazers over time

Watcher History

Github watchers over time, collection started in '23

Recent Commit History

7 commits on the default branch (main) since jan '22

Yearly Commits

Commits to the default branch (main) per year

Issue History

Languages

The only known language in this repository is Python

updated: 2024-11-23 @ 01:53am, id: 869556248 / R_kgDOM9RgGA