23 results found Sort:
- Filter by Primary Language:
- C++ (9)
- Cuda (9)
- Jupyter Notebook (1)
- Python (1)
- Rust (1)
- JavaScript (1)
- C# (1)
- +
A General-purpose Task-parallel Programming System using Modern C++
Created
2018-04-18
2,319 commits to master branch, last one 27 days ago
Sample codes for my CUDA programming book
Created
2019-05-03
919 commits to master branch, last one about a year ago
📚Modern CUDA Learn Notes with PyTorch: Tensor/CUDA Cores, 📖150+ CUDA Kernels with PyTorch bindings, 📖HGEMM/SGEMM (95%~99% cuBLAS performance), 📖100+ LLM/CUDA Blogs.
Created
2022-12-17
360 commits to main branch, last one a day ago
CUDA Core Compute Libraries
Created
2020-09-17
10,053 commits to main branch, last one a day ago
Thin, unified, C++-flavored wrappers for the CUDA APIs
Created
2016-11-11
1,004 commits to master branch, last one 3 months ago
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
Created
2022-09-01
29 commits to main branch, last one 4 months ago
TinyChatEngine: On-Device LLM Inference Library
Created
2023-05-24
55 commits to main branch, last one 4 months ago
Safe rust wrapper around CUDA toolkit
Created
2022-09-16
267 commits to main branch, last one 2 months ago
🚀 你的YOLO部署神器。TensorRT Plugin、CUDA Kernel、CUDA Graphs三管齐下,享受闪电般的推理速度。| Your YOLO Deployment Powerhouse. With the synergy of TensorRT Plugins, CUDA Kernels, and CUDA Graphs, experience lightning-fast i...
Created
2024-01-28
169 commits to main branch, last one 22 hours ago
A simple GPU hash table implemented in CUDA using lock free techniques
Created
2020-03-01
31 commits to master branch, last one about a year ago
A self-learning tutorail for CUDA High Performance Programing.
Created
2022-10-11
98 commits to develop branch, last one 9 days ago
This is an archive of materials produced for an introductory class on CUDA programming at Stanford University in 2010
Created
2015-03-14
112 commits to master branch, last one 2 years ago
From zero to hero CUDA for accelerating maths and machine learning on GPU.
Created
2024-05-20
14 commits to main branch, last one 4 months ago
μ-Cuda, COVER THE LAST MILE OF CUDA. With features: intellisense-friendly, structured launch, automatic cuda graph generation and updating.
Created
2022-12-18
383 commits to mini20 branch, last one 11 days ago
An implementation of HIP that works on CPUs, across OSes.
Created
2020-08-28
177 commits to master branch, last one 8 months ago
CUDA kernel author's tools
Created
2019-02-18
201 commits to master branch, last one 4 years ago
CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.
Created
2015-06-14
28 commits to master branch, last one about a year ago
CUDA Guide
Created
2020-09-25
18 commits to master branch, last one 10 months ago
Install CUDA on Windows11 using WSL2
Created
2023-05-23
52 commits to main branch, last one about a year ago
Speed up image preprocess with cuda when handle image or tensorrt inference
Created
2023-05-29
48 commits to main branch, last one 8 days ago
YOLOv9 Tensorrt deployment acceleration,provide two implementation methods: C++and Python🔥🔥🔥
Created
2024-02-23
17 commits to master branch, last one 8 months ago
cuda编程学习入门
Created
2022-02-02
73 commits to main branch, last one 4 months ago
bilibili视频【CUDA 12.x 并行编程入门(C++版)】配套代码
Created
2024-01-21
9 commits to main branch, last one 3 months ago