Bruce-Lee-LY / flash_attention_inference

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

Date Created 2023-08-16 (about a year ago)

Commits 1 (last one 25 days ago)

Stargazers 35 (0 this week)

Watchers 1 (0 this week)

Forks 3

License bsd-3-clause

Ranking

RepositoryStats indexes 630,443 repositories, of these Bruce-Lee-LY/flash_attention_inference is ranked #572,226 (9th percentile) for total stargazers, and #555,755 for total watchers. Github reports the primary language for this repository as C++, for repositories using this language it is ranked #31,129/33,653.

Bruce-Lee-LY/flash_attention_inference is also tagged with popular topics, for these it's ranked: llm (#3,020/3538), gpu (#917/965), cuda (#643/686), nvidia (#318/331), inference (#311/330)

All Topics

gpu llm mha cuda nvidia cutlass inference tensor-core flash-attention flash-attention-2 large-language-model multi-head-attention

Star History

Github stargazers over time

Watcher History

Github watchers over time, collection started in '23

Recent Commit History

1 commits on the default branch (master) since jan '22

Yearly Commits

Commits to the default branch (master) per year

Issue History

Languages

The primary language is C++ but there's also others...

updated: 2025-03-19 @ 06:23am, id: 679281575 / R_kgDOKH0Dpw