4 results found Sort:

📖A curated list of Awesome LLM/VLM Inference Papers with codes, such as FlashAttention, PagedAttention, Parallelism, etc. 🎉🎉
Created 2023-08-27
428 commits to main branch, last one 13 days ago
32
190
mit
4
Shush is an app that deploys a WhisperV3 model with Flash Attention v2 on Modal and makes requests to it via a NextJS app
Created 2023-11-18
64 commits to main branch, last one 6 months ago
Triton implementation of FlashAttention2 that adds Custom Masks.
Created 2024-07-20
18 commits to main branch, last one 4 months ago
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
Created 2023-08-16
1 commits to master branch, last one 3 months ago