20 results found Sort:

Sample codes for my CUDA programming book
Created 2019-05-03
919 commits to master branch, last one 10 months ago
79
740
bsd-3-clause
28
Thin, unified, C++-flavored wrappers for the CUDA APIs
Created 2016-11-11
986 commits to master branch, last one about a month ago
63
728
apache-2.0
7
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
Created 2022-09-01
28 commits to main branch, last one about a month ago
🎉CUDA 笔记 / 大模型手撕CUDA / C++笔记,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
Created 2022-12-17
93 commits to main branch, last one 13 days ago
TinyChatEngine: On-Device LLM Inference Library
Created 2023-05-24
52 commits to main branch, last one 4 days ago
59
439
apache-2.0
11
Safe rust wrapper around CUDA toolkit
Created 2022-09-16
235 commits to main branch, last one 20 hours ago
A simple GPU hash table implemented in CUDA using lock free techniques
Created 2020-03-01
31 commits to master branch, last one about a year ago
This is an archive of materials produced for an introductory class on CUDA programming at Stanford University in 2010
Created 2015-03-14
112 commits to master branch, last one about a year ago
6
122
apache-2.0
3
μ-Cuda, COVER THE LAST MILE OF CUDA. With features: intellisense-friendly, structured launch, automatic cuda graph generation and updating.
Created 2022-12-18
336 commits to mini branch, last one 2 days ago
From zero to hero CUDA for accelerating maths and machine learning on GPU.
Created 2024-05-20
11 commits to main branch, last one 6 days ago
19
106
mit
19
An implementation of HIP that works on CPUs, across OSes.
Created 2020-08-28
177 commits to master branch, last one 2 months ago
7
101
bsd-3-clause
8
CUDA kernel author's tools
Created 2019-02-18
201 commits to master branch, last one 4 years ago
CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.
Created 2015-06-14
28 commits to master branch, last one about a year ago
A self-learning tutorail for CUDA High Performance Programing.
Created 2022-10-11
81 commits to develop branch, last one 2 months ago
Speed up image preprocess with cuda when handle image or tensorrt inference
Created 2023-05-29
43 commits to main branch, last one 3 months ago
YOLOv9 Tensorrt deployment acceleration,provide two implementation methods: C++and Python🔥🔥🔥
Created 2024-02-23
17 commits to master branch, last one 3 months ago