5 results found Sort:
- Filter by Primary Language:
- Python (3)
- Cuda (1)
- +
📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, Flash-Attention, Paged-Attention, Parallelism, etc. 🎉🎉
Created
2023-08-27
451 commits to main branch, last one 4 days ago
Light-field imaging application for plenoptic cameras
Created
2019-03-30
1,555 commits to master branch, last one 11 months ago
TransMLA: Multi-Head Latent Attention Is All You Need
Created
2025-01-02
14 commits to main branch, last one 4 days ago
📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) SRAM complexity for headdim > 256, 1.8x~3x↑🎉vs SDPA EA.
Created
2024-11-29
242 commits to main branch, last one a day ago
Code for Palu: Compressing KV-Cache with Low-Rank Projection
Created
2024-07-02
42 commits to master branch, last one 3 days ago