1 result found Sort:
Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the decoding stage of LLM inference.
Created
2024-08-14
1 commits to master branch, last one 18 days ago