Bruce-Lee-LY / flash_attention_inference

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

Date Created 2023-08-16 (about a year ago)
Commits 1 (last one 2 months ago)
Stargazers 29 (0 this week)
Watchers 1 (0 this week)
Forks 3
License bsd-3-clause
Ranking

RepositoryStats indexes 584,353 repositories, of these Bruce-Lee-LY/flash_attention_inference is ranked #564,412 (3rd percentile) for total stargazers, and #535,930 for total watchers. Github reports the primary language for this repository as C++, for repositories using this language it is ranked #30,467/31,270.

Bruce-Lee-LY/flash_attention_inference is also tagged with popular topics, for these it's ranked: llm (#2,514/2726),  gpu (#887/900),  cuda (#620/635),  nvidia (#301/305),  inference (#293/301)

Other Information

Bruce-Lee-LY/flash_attention_inference has Github issues enabled, there is 1 open issue and 2 closed issues.

Star History

Github stargazers over time

Watcher History

Github watchers over time, collection started in '23

Recent Commit History

1 commits on the default branch (master) since jan '22

Yearly Commits

Commits to the default branch (master) per year

Issue History

Languages

The primary language is C++ but there's also others...

updated: 2024-11-14 @ 07:48pm, id: 679281575 / R_kgDOKH0Dpw