Search Results - RepositoryStats

246

2.3k

gpl-3.0

17

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

cuda gemm gemv hgemm

Created 2022-12-17

499 commits to main branch, last one 6 days ago

70

342

mit

5

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

gpu cuda gemm hgemm cublas nvidia tensor-core matrix-multiply

Created 2023-06-22

1 commits to master branch, last one 5 months ago

4

55

mit

5

Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.

gpu cuda gemm gemv hgemm hgemv cublas nvidia cuda-core tensor-core matrix-multiply

Created 2023-10-09

1 commits to master branch, last one 5 months ago

2

50

gpl-3.0

1

⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.

cuda hgemm tensor-cores

Created 2024-11-30

43 commits to main branch, last one 7 days ago