Search Results - RepositoryStats

CUDA-Learn-Notes DefTruth

300

2.9k

gpl-3.0

22

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

cuda gemm gemv cudnn hgemm cutlass flash-mla cuda-kernels cuda-toolkit flash-attention cuda-programming

Created 2022-12-17

506 commits to main branch, last one 7 hours ago

flux bytedance

47

776

apache-2.0

13

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

gpu cuda cutlass pytorch

Created 2024-03-01

31 commits to main branch, last one a day ago

awesome-cuda-triton-hpc coderonion

27

221

unknown

5

🔥🔥🔥 A collection of some awesome public CUDA, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR and High Performance Computing (HPC) projects.

Created 2023-02-23

31 commits to main branch, last one 3 days ago

Cute-Learning DD-DuDa

16

145

mit

1

Examples of CUDA implementations by Cutlass CuTe

gpu cuda cutlass

Created 2024-04-28

29 commits to main branch, last one about a month ago

CUTLASS-Examples leimao

4

42

bsd-3-clause

1

CUTLASS and CuTe Examples

cuda docker cutlass

Created 2024-07-29

166 commits to main branch, last one 2 months ago

flash_attention_inference Bruce-Lee-LY

3

35

bsd-3-clause

1

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

gpu llm mha cuda nvidia cutlass inference tensor-core flash-attention flash-attention-2 large-language-model multi-head-attention

Created 2023-08-16

1 commits to master branch, last one 20 days ago