21 results found Sort:
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
Created
2018-03-27
77 commits to master branch, last one 2 months ago
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Created
2023-06-15
730 commits to main branch, last one 20 hours ago
Deep learning in Rust, with shape checked tensors and neural networks
Created
2021-10-12
890 commits to main branch, last one 5 months ago
CUDA C++ Core Libraries
Created
2020-09-17
9,572 commits to main branch, last one 10 hours ago
🎉CUDA 笔记 / 大模型手撕CUDA / C++笔记,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
Created
2022-12-17
108 commits to main branch, last one 4 days ago
Safe rust wrapper around CUDA toolkit
Created
2022-09-16
243 commits to main branch, last one a day ago
CUDA Kernel Benchmarking Library
Created
2021-03-03
454 commits to main branch, last one 26 days ago
🚀 TensorRT-YOLO: Supports YOLOv3, YOLOv5, YOLOv6, YOLOv7, YOLOv8, YOLOv9, YOLOv10, and PP-YOLOE using TensorRT acceleration with EfficientNMS, CUDA Kernels and CUDA Graphs!
Created
2024-01-28
102 commits to main branch, last one 5 days ago
Simple utilities to enable code reuse and portability between CUDA C/C++ and standard C/C++.
Created
2012-10-03
155 commits to master branch, last one 8 years ago
Kernel Tuner
Created
2016-03-28
2,041 commits to master branch, last one 15 days ago
This is an archive of materials produced for an introductory class on CUDA programming at Stanford University in 2010
Created
2015-03-14
112 commits to master branch, last one 2 years ago
Amplifier allows .NET developers to easily run complex applications with intensive mathematical computation on Intel CPU/GPU, NVIDIA, AMD without writing any additional C kernel code. Write your funct...
Created
2019-06-05
80 commits to master branch, last one about a year ago
From zero to hero CUDA for accelerating maths and machine learning on GPU.
Created
2024-05-20
12 commits to main branch, last one 24 days ago
Some CUDA design patterns and a bit of template magic for CUDA
Created
2018-11-16
41 commits to master branch, last one about a year ago
CUDA kernel author's tools
Created
2019-02-18
201 commits to master branch, last one 4 years ago
Open source cross-platform compiler for compute-intensive loops used in AI algorithms, from Microsoft Research
Created
2021-09-24
120 commits to main branch, last one about a year ago
High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.
Created
2023-05-26
52 commits to main branch, last one about a year ago
A tool for examining GPU scheduling behavior.
Created
2017-03-29
246 commits to master branch, last one 23 days ago
CUDA Guide
Created
2020-09-25
18 commits to master branch, last one 5 months ago
(REOS) Radar and Electro-Optical Simulation Framework written in C++.
Created
2019-09-27
4,636 commits to master branch, last one a day ago
Speed up image preprocess with cuda when handle image or tensorrt inference
Created
2023-05-29
43 commits to main branch, last one 3 months ago