6 results found Sort:
- Filter by Primary Language:
- Python (5)
- Cuda (1)
- +
Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper
Created
2025-02-19
209 commits to main branch, last one 2 days ago
SpargeAttention: A training-free sparse attention that can accelerate any model inference.
Created
2025-02-25
26 commits to main branch, last one 14 days ago
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
Created
2024-10-22
13 commits to main branch, last one 5 months ago
Efficient triton implementation of Native Sparse Attention.
Created
2025-02-24
54 commits to main branch, last one 3 days ago
The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>
Created
2024-06-19
31 commits to master branch, last one 3 months ago
Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
Created
2025-02-18
8 commits to main branch, last one 18 hours ago