3 results found Sort:

📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, Flash-Attention, Paged-Attention, Parallelism, etc. 🎉🎉
Created 2023-08-27
458 commits to main branch, last one 15 days ago
300
2.9k
gpl-3.0
22
📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
Created 2022-12-17
506 commits to main branch, last one 10 hours ago
📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) GPU SRAM complexity for headdim > 256, ~2x↑🎉vs SDPA EA.
Created 2024-11-29
246 commits to main branch, last one 3 days ago