2 results found Sort:
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
Created
2024-10-22
13 commits to main branch, last one 24 days ago
The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>
Created
2024-06-19
26 commits to master branch, last one 11 days ago