Search Results - RepositoryStats

2 results found Sort:

mit

Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.

gpu cuda gemm gemv hgemm hgemv cublas nvidia cuda-core tensor-core matrix-multiply

Created 2023-10-09

1 commits to master branch, last one 5 months ago

bsd-3-clause

Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the decoding stage of LLM inference.

gpu llm mha cuda nvidia cuda-core inference flashinfer flash-attention decoding-attention large-language-model multi-head-attention

Created 2024-08-14

1 commits to master branch, last one 3 months ago