22 results found Sort:
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
Created
2018-03-27
78 commits to master branch, last one 4 months ago
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Created
2023-06-15
1,055 commits to main branch, last one a day ago
Deep learning in Rust, with shape checked tensors and neural networks
Created
2021-10-12
890 commits to main branch, last one 11 months ago
CUDA Core Compute Libraries
Created
2020-09-17
10,252 commits to main branch, last one 10 hours ago
🚀 你的YOLO部署神器。TensorRT Plugin、CUDA Kernel、CUDA Graphs三管齐下,享受闪电般的推理速度。| Your YOLO Deployment Powerhouse. With the synergy of TensorRT Plugins, CUDA Kernels, and CUDA Graphs, experience lightning-fast i...
Created
2024-01-28
197 commits to main branch, last one a day ago
Safe rust wrapper around CUDA toolkit
Created
2022-09-16
272 commits to main branch, last one 4 days ago
CUDA Kernel Benchmarking Library
Created
2021-03-03
458 commits to main branch, last one about a month ago
Simple utilities to enable code reuse and portability between CUDA C/C++ and standard C/C++.
Created
2012-10-03
155 commits to master branch, last one 8 years ago
Kernel Tuner
Created
2016-03-28
2,101 commits to master branch, last one 8 days ago
This is an archive of materials produced for an introductory class on CUDA programming at Stanford University in 2010
Created
2015-03-14
112 commits to master branch, last one 2 years ago
Amplifier allows .NET developers to easily run complex applications with intensive mathematical computation on Intel CPU/GPU, NVIDIA, AMD without writing any additional C kernel code. Write your funct...
Created
2019-06-05
80 commits to master branch, last one 2 years ago
From zero to hero CUDA for accelerating maths and machine learning on GPU.
Created
2024-05-20
14 commits to main branch, last one 5 months ago
Some CUDA design patterns and a bit of template magic for CUDA
Created
2018-11-16
41 commits to master branch, last one about a year ago
CUDA kernel author's tools
Created
2019-02-18
201 commits to master branch, last one 4 years ago
Open source cross-platform compiler for compute-intensive loops used in AI algorithms, from Microsoft Research
Created
2021-09-24
120 commits to main branch, last one about a year ago
High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.
Created
2023-05-26
52 commits to main branch, last one about a year ago
Triton implementation of FlashAttention2 that adds Custom Masks.
Created
2024-07-20
18 commits to main branch, last one 4 months ago
A tool for examining GPU scheduling behavior.
Created
2017-03-29
247 commits to master branch, last one 4 months ago
CUDA Guide
Created
2020-09-25
18 commits to master branch, last one 11 months ago
(REOS) Radar and Electro-Optical Simulation Framework written in C++.
Created
2019-09-27
5,028 commits to master branch, last one a day ago
Speed up image preprocess with cuda when handle image or tensorrt inference
Created
2023-05-29
49 commits to main branch, last one 3 days ago
cuda编程学习入门
Created
2022-02-02
73 commits to main branch, last one 5 months ago