23 results found Sort:

Sample codes for my CUDA programming book
Created 2019-05-03
919 commits to master branch, last one about a year ago
🚀 你的YOLO部署神器。TensorRT Plugin、CUDA Kernel、CUDA Graphs三管齐下,享受闪电般的推理速度。| Your YOLO Deployment Powerhouse. With the synergy of TensorRT Plugins, CUDA Kernels, and CUDA Graphs, experience lightning-fast i...
Created 2024-01-28
197 commits to main branch, last one a day ago
80
804
bsd-3-clause
30
Thin, unified, C++-flavored wrappers for the CUDA APIs
Created 2016-11-11
1,004 commits to master branch, last one 4 months ago
TinyChatEngine: On-Device LLM Inference Library
Created 2023-05-24
55 commits to main branch, last one 5 months ago
64
771
apache-2.0
7
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
Created 2022-09-01
29 commits to main branch, last one 5 months ago
82
663
apache-2.0
13
Safe rust wrapper around CUDA toolkit
Created 2022-09-16
272 commits to main branch, last one 4 days ago
A simple GPU hash table implemented in CUDA using lock free techniques
Created 2020-03-01
31 commits to master branch, last one about a year ago
33
290
apache-2.0
5
A self-learning tutorail for CUDA High Performance Programing.
Created 2022-10-11
102 commits to develop branch, last one 4 days ago
25
289
unknown
5
LLM notes, including model inference, transformer model structure, and llm framework code analysis notes
Created 2024-09-18
174 commits to main branch, last one 23 hours ago
This is an archive of materials produced for an introductory class on CUDA programming at Stanford University in 2010
Created 2015-03-14
112 commits to master branch, last one 2 years ago
From zero to hero CUDA for accelerating maths and machine learning on GPU.
Created 2024-05-20
14 commits to main branch, last one 5 months ago
8
159
apache-2.0
5
μ-Cuda, COVER THE LAST MILE OF CUDA. With features: intellisense-friendly, structured launch, automatic cuda graph generation and updating.
Created 2022-12-18
384 commits to mini20 branch, last one 26 days ago
19
114
mit
19
An implementation of HIP that works on CPUs, across OSes.
Created 2020-08-28
177 commits to master branch, last one 9 months ago
8
109
bsd-3-clause
8
CUDA kernel author's tools
Created 2019-02-18
201 commits to master branch, last one 4 years ago
CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.
Created 2015-06-14
28 commits to master branch, last one about a year ago
Speed up image preprocess with cuda when handle image or tensorrt inference
Created 2023-05-29
49 commits to main branch, last one 3 days ago
YOLOv9 Tensorrt deployment acceleration,provide two implementation methods: C++and Python🔥🔥🔥
Created 2024-02-23
17 commits to master branch, last one 9 months ago
cuda编程学习入门
Created 2022-02-02
73 commits to main branch, last one 5 months ago
bilibili视频【CUDA 12.x 并行编程入门(C++版)】配套代码
Created 2024-01-21
9 commits to main branch, last one 4 months ago