2 results found Sort:

Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.
Created 2023-10-09
1 commits to master branch, last one 3 months ago
Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the decoding stage of LLM inference.
Created 2024-08-14
1 commits to master branch, last one about a month ago