25 results found Sort:

164
4.0k
apache-2.0
55
Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.
Created 2021-10-17
259 commits to main branch, last one 4 hours ago
📚Modern CUDA Learn Notes: 200+ Tensor/CUDA Cores Kernels🎉, HGEMM, FA2 via MMA and CuTe, 98~100% TFLOPS of cuBLAS/FA2.
Created 2022-12-17
529 commits to main branch, last one 12 hours ago
Sample codes for my CUDA programming book
Created 2019-05-03
922 commits to master branch, last one about a month ago
TinyChatEngine: On-Device LLM Inference Library
Created 2023-05-24
55 commits to main branch, last one 9 months ago
83
831
bsd-3-clause
28
Thin, unified, C++-flavored wrappers for the CUDA APIs
Created 2016-11-11
1,079 commits to master branch, last one 21 days ago
94
792
apache-2.0
10
Safe rust wrapper around CUDA toolkit
Created 2022-09-16
364 commits to main branch, last one a day ago
67
785
apache-2.0
7
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
Created 2022-09-01
29 commits to main branch, last one 9 months ago
69
697
unknown
7
LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.
Created 2024-09-18
301 commits to main branch, last one a day ago
57
552
apache-2.0
7
A self-learning tutorail for CUDA High Performance Programing.
Created 2022-10-11
107 commits to develop branch, last one 3 days ago
A simple GPU hash table implemented in CUDA using lock free techniques
Created 2020-03-01
31 commits to master branch, last one 2 years ago
This is an archive of materials produced for an introductory class on CUDA programming at Stanford University in 2010
Created 2015-03-14
112 commits to master branch, last one 2 years ago
From zero to hero CUDA for accelerating maths and machine learning on GPU.
Created 2024-05-20
15 commits to main branch, last one 15 days ago
8
173
apache-2.0
5
μ-Cuda, COVER THE LAST MILE OF CUDA. With features: intellisense-friendly, structured launch, automatic cuda graph generation and updating.
Created 2022-12-18
398 commits to mini20 branch, last one 7 days ago
CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.
Created 2015-06-14
28 commits to master branch, last one 2 years ago
18
115
mit
18
An implementation of HIP that works on CPUs, across OSes.
Created 2020-08-28
177 commits to master branch, last one about a year ago
Accelerated General (FP32) Matrix Multiplication from scratch in CUDA
Created 2024-08-11
97 commits to master branch, last one 3 months ago
8
111
bsd-3-clause
7
CUDA kernel author's tools
Created 2019-02-18
201 commits to master branch, last one 4 years ago
Speed up image preprocess with cuda when handle image or tensorrt inference
Created 2023-05-29
52 commits to main branch, last one 13 days ago
cuda编程学习入门
Created 2022-02-02
73 commits to main branch, last one 8 months ago
YOLOv9 Tensorrt deployment acceleration,provide two implementation methods: C++and Python🔥🔥🔥
Created 2024-02-23
17 commits to master branch, last one about a year ago
bilibili视频【CUDA 12.x 并行编程入门(C++版)】配套代码
Created 2024-01-21
9 commits to main branch, last one 8 months ago