6 results found Sort:

Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper
Created 2025-02-19
209 commits to main branch, last one 2 days ago
19
365
apache-2.0
5
SpargeAttention: A training-free sparse attention that can accelerate any model inference.
Created 2025-02-25
26 commits to main branch, last one 14 days ago
7
153
apache-2.0
3
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
Created 2024-10-22
13 commits to main branch, last one 5 months ago
Efficient triton implementation of Native Sparse Attention.
Created 2025-02-24
54 commits to main branch, last one 3 days ago
6
122
mit
6
The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>
Created 2024-06-19
31 commits to master branch, last one 3 months ago
4
66
apache-2.0
1
Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
Created 2025-02-18
8 commits to main branch, last one 18 hours ago