Bruce-Lee-LY / flash_attention_inference

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

Date Created 2023-08-16 (about a year ago)
Commits 1 (last one 3 months ago)
Stargazers 32 (0 this week)
Watchers 1 (0 this week)
Forks 3
License bsd-3-clause
Ranking

RepositoryStats indexes 595,856 repositories, of these Bruce-Lee-LY/flash_attention_inference is ranked #560,208 (6th percentile) for total stargazers, and #544,643 for total watchers. Github reports the primary language for this repository as C++, for repositories using this language it is ranked #30,347/31,836.

Bruce-Lee-LY/flash_attention_inference is also tagged with popular topics, for these it's ranked: llm (#2,595/2913),  gpu (#891/918),  cuda (#624/652),  nvidia (#301/312),  inference (#297/309)

Other Information

Bruce-Lee-LY/flash_attention_inference has Github issues enabled, there are 2 open issues and 2 closed issues.

Star History

Github stargazers over time

Watcher History

Github watchers over time, collection started in '23

Recent Commit History

1 commits on the default branch (master) since jan '22

Yearly Commits

Commits to the default branch (master) per year

Issue History

Languages

The primary language is C++ but there's also others...

updated: 2024-12-18 @ 11:33pm, id: 679281575 / R_kgDOKH0Dpw