21 results found Sort:

1.7k
5.7k
other
116
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
Created 2018-03-27
77 commits to master branch, last one 2 months ago
277
3.1k
apache-2.0
29
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Created 2023-06-15
730 commits to main branch, last one 20 hours ago
98
1.7k
other
34
Deep learning in Rust, with shape checked tensors and neural networks
Created 2021-10-12
890 commits to main branch, last one 5 months ago
🎉CUDA 笔记 / 大模型手撕CUDA / C++笔记,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
Created 2022-12-17
108 commits to main branch, last one 4 days ago
65
468
apache-2.0
10
Safe rust wrapper around CUDA toolkit
Created 2022-09-16
243 commits to main branch, last one a day ago
60
437
apache-2.0
18
CUDA Kernel Benchmarking Library
Created 2021-03-03
454 commits to main branch, last one 26 days ago
🚀 TensorRT-YOLO: Supports YOLOv3, YOLOv5, YOLOv6, YOLOv7, YOLOv8, YOLOv9, YOLOv10, and PP-YOLOE using TensorRT acceleration with EfficientNMS, CUDA Kernels and CUDA Graphs!
Created 2024-01-28
102 commits to main branch, last one 5 days ago
53
339
bsd-3-clause
29
Simple utilities to enable code reuse and portability between CUDA C/C++ and standard C/C++.
Created 2012-10-03
155 commits to master branch, last one 8 years ago
This is an archive of materials produced for an introductory class on CUDA programming at Stanford University in 2010
Created 2015-03-14
112 commits to master branch, last one 2 years ago
Amplifier allows .NET developers to easily run complex applications with intensive mathematical computation on Intel CPU/GPU, NVIDIA, AMD without writing any additional C kernel code. Write your funct...
Created 2019-06-05
80 commits to master branch, last one about a year ago
From zero to hero CUDA for accelerating maths and machine learning on GPU.
Created 2024-05-20
12 commits to main branch, last one 24 days ago
Some CUDA design patterns and a bit of template magic for CUDA
Created 2018-11-16
41 commits to master branch, last one about a year ago
8
104
bsd-3-clause
8
CUDA kernel author's tools
Created 2019-02-18
201 commits to master branch, last one 4 years ago
Open source cross-platform compiler for compute-intensive loops used in AI algorithms, from Microsoft Research
Created 2021-09-24
120 commits to main branch, last one about a year ago
High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.
Created 2023-05-26
52 commits to main branch, last one about a year ago
A tool for examining GPU scheduling behavior.
Created 2017-03-29
246 commits to master branch, last one 23 days ago
Speed up image preprocess with cuda when handle image or tensorrt inference
Created 2023-05-29
43 commits to main branch, last one 3 months ago