1 result found Sort:

Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the decoding stage of LLM inference.
Created 2024-08-14
1 commits to master branch, last one about a month ago