Bruce-Lee-LY / decoding_attention

Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the decoding stage of LLM inference.

Date Created 2024-08-14 (3 months ago)
Commits 1 (last one 18 days ago)
Stargazers 26 (3 this week)
Watchers 2 (0 this week)
Forks 1
License bsd-3-clause
Ranking

RepositoryStats indexes 585,332 repositories, of these Bruce-Lee-LY/decoding_attention is ranked #575,246 (2nd percentile) for total stargazers, and #479,233 for total watchers. Github reports the primary language for this repository as C++, for repositories using this language it is ranked #30,899/31,307.

Bruce-Lee-LY/decoding_attention is also tagged with popular topics, for these it's ranked: llm (#2,644/2744),  gpu (#896/903),  cuda (#629/638),  nvidia (#304/307),  inference (#298/303)

Star History

Github stargazers over time

Watcher History

Github watchers over time, collection started in '23

Recent Commit History

1 commits on the default branch (master) since jan '22

Yearly Commits

Commits to the default branch (master) per year

Issue History

No issues have been posted

Languages

The primary language is C++ but there's also others...

updated: 2024-11-22 @ 05:30am, id: 842468267 / R_kgDOMjcLqw