2 results found Sort:
Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.
Created
2023-10-09
1 commits to master branch, last one 3 months ago
Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the decoding stage of LLM inference.
Created
2024-08-14
1 commits to master branch, last one about a month ago