27 results found Sort:

2.0k
7.3k
other
119
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
Created 2018-03-27
320 commits to master branch, last one about a month ago
535
6.2k
apache-2.0
50
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Created 2023-06-15
1,245 commits to main branch, last one 21 hours ago
179
4.3k
apache-2.0
57
Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.
Created 2021-10-17
266 commits to main branch, last one 4 days ago
📚Modern CUDA Learn Notes: 200+ Tensor/CUDA Cores Kernels🎉, HGEMM, FA2 via MMA and CuTe, 98~100% TFLOPS of cuBLAS/FA2.
Created 2022-12-17
540 commits to main branch, last one 22 hours ago
106
1.8k
other
32
Deep learning in Rust, with shape checked tensors and neural networks
Created 2021-10-12
890 commits to main branch, last one about a year ago
97
821
apache-2.0
10
Safe rust wrapper around CUDA toolkit
Created 2022-09-16
417 commits to main branch, last one 9 hours ago
74
620
apache-2.0
17
CUDA Kernel Benchmarking Library
Created 2021-03-03
515 commits to main branch, last one 3 days ago
54
347
bsd-3-clause
28
Simple utilities to enable code reuse and portability between CUDA C/C++ and standard C/C++.
Created 2012-10-03
155 commits to master branch, last one 9 years ago
This is an archive of materials produced for an introductory class on CUDA programming at Stanford University in 2010
Created 2015-03-14
112 commits to master branch, last one 2 years ago
Zero to Hero GPU and CUDA for Maths & ML tutorials with examples.
Created 2024-05-20
16 commits to main branch, last one 7 days ago
Amplifier allows .NET developers to easily run complex applications with intensive mathematical computation on Intel CPU/GPU, NVIDIA, AMD without writing any additional C kernel code. Write your funct...
Created 2019-06-05
82 commits to master branch, last one 13 days ago
Some CUDA design patterns and a bit of template magic for CUDA
Created 2018-11-16
41 commits to master branch, last one about a year ago
8
111
bsd-3-clause
7
CUDA kernel author's tools
Created 2019-02-18
201 commits to master branch, last one 4 years ago
19
110
mit
12
Open source cross-platform compiler for compute-intensive loops used in AI algorithms, from Microsoft Research
Created 2021-09-24
120 commits to main branch, last one 2 years ago
Triton implementation of FlashAttention2 that adds Custom Masks.
Created 2024-07-20
18 commits to main branch, last one 8 months ago
High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.
Created 2023-05-26
52 commits to main branch, last one about a year ago
TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.
Created 2024-10-10
82 commits to master branch, last one a day ago
A tool for examining GPU scheduling behavior.
Created 2017-03-29
247 commits to master branch, last one 8 months ago
Speed up image preprocess with cuda when handle image or tensorrt inference
Created 2023-05-29
52 commits to main branch, last one 26 days ago
Astrophysics program simulating the evolution of star systems based on the fast multipole method on adaptive Octrees
Created 2016-11-12
3,824 commits to master branch, last one 13 days ago
Implementation of ConjugateGradients method using C and Nvidia CUDA
Created 2017-10-07
24 commits to master branch, last one 7 years ago
cuda编程学习入门
Created 2022-02-02
73 commits to main branch, last one 9 months ago
Faster Pytorch bitsandbytes 4bit fp4 nn.Linear ops
Created 2024-02-20
24 commits to main branch, last one about a year ago