27 results found Sort:
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
Created
2018-03-27
320 commits to master branch, last one about a month ago
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Created
2023-06-15
1,245 commits to main branch, last one 21 hours ago
Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.
Created
2021-10-17
266 commits to main branch, last one 4 days ago
📚Modern CUDA Learn Notes: 200+ Tensor/CUDA Cores Kernels🎉, HGEMM, FA2 via MMA and CuTe, 98~100% TFLOPS of cuBLAS/FA2.
Created
2022-12-17
540 commits to main branch, last one 22 hours ago
Deep learning in Rust, with shape checked tensors and neural networks
Created
2021-10-12
890 commits to main branch, last one about a year ago
CUDA Core Compute Libraries
Created
2020-09-17
10,989 commits to main branch, last one 11 hours ago
Safe rust wrapper around CUDA toolkit
Created
2022-09-16
417 commits to main branch, last one 9 hours ago
CUDA Kernel Benchmarking Library
Created
2021-03-03
515 commits to main branch, last one 3 days ago
Simple utilities to enable code reuse and portability between CUDA C/C++ and standard C/C++.
Created
2012-10-03
155 commits to master branch, last one 9 years ago
Kernel Tuner
Created
2016-03-28
2,131 commits to master branch, last one 13 days ago
This is an archive of materials produced for an introductory class on CUDA programming at Stanford University in 2010
Created
2015-03-14
112 commits to master branch, last one 2 years ago
Zero to Hero GPU and CUDA for Maths & ML tutorials with examples.
Created
2024-05-20
16 commits to main branch, last one 7 days ago
Amplifier allows .NET developers to easily run complex applications with intensive mathematical computation on Intel CPU/GPU, NVIDIA, AMD without writing any additional C kernel code. Write your funct...
Created
2019-06-05
82 commits to master branch, last one 13 days ago
Some CUDA design patterns and a bit of template magic for CUDA
Created
2018-11-16
41 commits to master branch, last one about a year ago
CUDA kernel author's tools
Created
2019-02-18
201 commits to master branch, last one 4 years ago
Open source cross-platform compiler for compute-intensive loops used in AI algorithms, from Microsoft Research
Created
2021-09-24
120 commits to main branch, last one 2 years ago
Triton implementation of FlashAttention2 that adds Custom Masks.
Created
2024-07-20
18 commits to main branch, last one 8 months ago
High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.
Created
2023-05-26
52 commits to main branch, last one about a year ago
TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.
Created
2024-10-10
82 commits to master branch, last one a day ago
A tool for examining GPU scheduling behavior.
Created
2017-03-29
247 commits to master branch, last one 8 months ago
Speed up image preprocess with cuda when handle image or tensorrt inference
Created
2023-05-29
52 commits to main branch, last one 26 days ago
CUDA Guide
Created
2020-09-25
18 commits to master branch, last one about a year ago
(REOS) Radar and Electro-Optical Simulation Framework written in C++.
Created
2019-09-27
5,391 commits to master branch, last one 21 hours ago
Astrophysics program simulating the evolution of star systems based on the fast multipole method on adaptive Octrees
Created
2016-11-12
3,824 commits to master branch, last one 13 days ago
Implementation of ConjugateGradients method using C and Nvidia CUDA
Created
2017-10-07
24 commits to master branch, last one 7 years ago
cuda编程å¦ä¹ 入门
Created
2022-02-02
73 commits to main branch, last one 9 months ago
Faster Pytorch bitsandbytes 4bit fp4 nn.Linear ops
Created
2024-02-20
24 commits to main branch, last one about a year ago