Bruce-Lee-LY / decoding_attention

Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.

Date Created 2024-08-14 (7 months ago)

Commits 2 (last one 16 days ago)

Stargazers 35 (0 this week)

Watchers 2 (0 this week)

Forks 2

License bsd-3-clause

Ranking

RepositoryStats indexes 631,351 repositories, of these Bruce-Lee-LY/decoding_attention is ranked #573,066 (9th percentile) for total stargazers, and #480,831 for total watchers. Github reports the primary language for this repository as C++, for repositories using this language it is ranked #31,142/33,671.

Bruce-Lee-LY/decoding_attention is also tagged with popular topics, for these it's ranked: llm (#3,026/3543), gpu (#917/965), cuda (#643/686), nvidia (#318/331), inference (#311/330)

All Topics

gpu gqa llm mha mla mqa cuda nvidia flashmla cuda-core inference flashinfer flash-attention decoding-attention large-language-model multi-head-attention

Star History

Github stargazers over time

Watcher History

Github watchers over time, collection started in '23

Recent Commit History

2 commits on the default branch (master) since jan '22

Yearly Commits

Commits to the default branch (master) per year

Issue History

No issues have been posted

Languages

The primary language is C++ but there's also others...

updated: 2025-03-16 @ 10:16pm, id: 842468267 / R_kgDOMjcLqw