5 results found Sort:
- Filter by Primary Language:
- C++ (2)
- Cuda (1)
- Makefile (1)
- +
A fast communication-overlapping library for tensor parallelism on GPUs.
Created
2024-03-01
24 commits to main branch, last one 2 months ago
🔥🔥🔥 A collection of some awesome public CUDA, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR and High Performance Computing (HPC) projects.
Created
2023-02-23
20 commits to main branch, last one 7 days ago
Examples of CUDA implementations by Cutlass CuTe
Created
2024-04-28
26 commits to main branch, last one about a month ago
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
Created
2023-08-16
1 commits to master branch, last one 4 months ago
CUTLASS and CuTe Examples
Created
2024-07-29
166 commits to main branch, last one 2 days ago