Search Results - RepositoryStats

259

3.7k

gpl-3.0

116

📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, Flash-Attention, Paged-Attention, Parallelism, etc. 🎉🎉

mla vllm deepseek flash-mla minimax-01 awesome-llm deepseek-r1 deepseek-v3 tensorrt-llm llm-inference flash-attention paged-attention flash-attention-3

Created 2023-08-27

458 commits to main branch, last one 15 days ago

300

2.9k

gpl-3.0

22

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

cuda gemm gemv cudnn hgemm cutlass flash-mla cuda-kernels cuda-toolkit flash-attention cuda-programming

Created 2022-12-17

506 commits to main branch, last one 10 hours ago

6

147

gpl-3.0

2

📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) GPU SRAM complexity for headdim > 256, ~2x↑🎉vs SDPA EA.

mla cuda sdpa mlsys deepseek attention flash-mla fused-mla deepseek-r1 deepseek-v3 tensor-cores flash-attention

Created 2024-11-29

246 commits to main branch, last one 3 days ago