Statistics for topic cuda
RepositoryStats tracks 584,797 Github repositories, of these 635 are tagged with the cuda topic. The most common primary language for repositories using this topic is C++ (229). Other languages include: Python (147), Cuda (73), C (29), Jupyter Notebook (26), Rust (19), Dockerfile (13), Shell (13)
Stargazers over time for topic cuda
Most starred repositories for topic cuda (view more)
Trending repositories for topic cuda (view more)
A high-throughput and memory-efficient inference and serving engine for LLMs
SGLang is a fast serving framework for large language models and vision language models.
📚Modern CUDA Learn Notes with PyTorch: Tensor/CUDA Cores, 📖150+ CUDA Kernels with PyTorch bindings, 📖HGEMM/SGEMM (95%~99% cuBLAS performance), 📖100+ LLM/CUDA Blogs.
Unbiased & physically-based GPU HIPRT (C++/HIP) interactive path tracing renderer
A nearly complete collection of prefix sum algorithms implemented in CUDA, D3D12, Unity and WGPU. Theoretically portable to all wave/warp/subgroup sizes.
A high-throughput and memory-efficient inference and serving engine for LLMs
SGLang is a fast serving framework for large language models and vision language models.
📚Modern CUDA Learn Notes with PyTorch: Tensor/CUDA Cores, 📖150+ CUDA Kernels with PyTorch bindings, 📖HGEMM/SGEMM (95%~99% cuBLAS performance), 📖100+ LLM/CUDA Blogs.
Efficient CUDA kernels for training convolutional neural networks with PyTorch.
A high-throughput and memory-efficient inference and serving engine for LLMs
SGLang is a fast serving framework for large language models and vision language models.
Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.
Unbiased & physically-based GPU HIPRT (C++/HIP) interactive path tracing renderer
Best practices & guides on how to write distributed pytorch training code
SGLang is a fast serving framework for large language models and vision language models.
Multi-platform high-performance compute language extension for Rust.
🚀 你的YOLO部署神器。TensorRT Plugin、CUDA Kernel、CUDA Graphs三管齐下,享受闪电般的推理速度。| Your YOLO Deployment Powerhouse. With the synergy of TensorRT Plugins, CUDA Kernels, and CUDA Graphs, experience lightning-fast i...
A high-throughput and memory-efficient inference and serving engine for LLMs
SGLang is a fast serving framework for large language models and vision language models.
📚Modern CUDA Learn Notes with PyTorch: Tensor/CUDA Cores, 📖150+ CUDA Kernels with PyTorch bindings, 📖HGEMM/SGEMM (95%~99% cuBLAS performance), 📖100+ LLM/CUDA Blogs.
SGLang is a fast serving framework for large language models and vision language models.
PyTorch native quantization and sparsity for training and inference