5 results found Sort:

📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, Flash-Attention, Paged-Attention, Parallelism, etc. 🎉🎉
Created 2023-08-27
451 commits to main branch, last one 4 days ago
37
213
gpl-3.0
9
Light-field imaging application for plenoptic cameras
Created 2019-03-30
1,555 commits to master branch, last one 11 months ago
12
141
mit
3
TransMLA: Multi-Head Latent Attention Is All You Need
Created 2025-01-02
14 commits to main branch, last one 4 days ago
📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) SRAM complexity for headdim > 256, 1.8x~3x↑🎉vs SDPA EA.
Created 2024-11-29
242 commits to main branch, last one a day ago
Code for Palu: Compressing KV-Cache with Low-Rank Projection
Created 2024-07-02
42 commits to master branch, last one 3 days ago