3 results found Sort:
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
Created
2023-06-22
1 commits to master branch, last one 3 months ago
Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.
Created
2023-10-09
1 commits to master branch, last one 3 months ago
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
Created
2023-08-16
1 commits to master branch, last one 3 months ago