Statistics for topic cuda
RepositoryStats tracks 638,211 Github repositories, of these 690 are tagged with the cuda topic. The most common primary language for repositories using this topic is C++ (252). Other languages include: Python (154), Cuda (82), C (35), Jupyter Notebook (26), Rust (21), Dockerfile (14), Shell (13)
Stargazers over time for topic cuda
Most starred repositories for topic cuda (view more)
Trending repositories for topic cuda (view more)
A high-throughput and memory-efficient inference and serving engine for LLMs
Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.
SGLang is a fast serving framework for large language models and vision language models.
📚Modern CUDA Learn Notes: 200+ Tensor/CUDA Cores Kernels🎉, HGEMM, FA2 via MMA and CuTe, 98~100% TFLOPS of cuBLAS/FA2.
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.
📚Modern CUDA Learn Notes: 200+ Tensor/CUDA Cores Kernels🎉, HGEMM, FA2 via MMA and CuTe, 98~100% TFLOPS of cuBLAS/FA2.
A high-throughput and memory-efficient inference and serving engine for LLMs
SGLang is a fast serving framework for large language models and vision language models.
Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.
📚Modern CUDA Learn Notes: 200+ Tensor/CUDA Cores Kernels🎉, HGEMM, FA2 via MMA and CuTe, 98~100% TFLOPS of cuBLAS/FA2.
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
A GPU-accelerated library for Tree-based Genetic Programming, leveraging PyTorch and custom CUDA kernels for high-performance evolutionary computation. It supports symbolic regression, classification,...
SplatAD: Real-Time Lidar and Camera Rendering with 3D Gaussian Splatting for Autonomous Driving
Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.
📚Modern CUDA Learn Notes: 200+ Tensor/CUDA Cores Kernels🎉, HGEMM, FA2 via MMA and CuTe, 98~100% TFLOPS of cuBLAS/FA2.
A high-throughput and memory-efficient inference and serving engine for LLMs
SGLang is a fast serving framework for large language models and vision language models.
📚Modern CUDA Learn Notes: 200+ Tensor/CUDA Cores Kernels🎉, HGEMM, FA2 via MMA and CuTe, 98~100% TFLOPS of cuBLAS/FA2.
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.
A GPU-accelerated library for Tree-based Genetic Programming, leveraging PyTorch and custom CUDA kernels for high-performance evolutionary computation. It supports symbolic regression, classification,...
SplatAD: Real-Time Lidar and Camera Rendering with 3D Gaussian Splatting for Autonomous Driving
YOLOv12 Inference Using CPP, Tensorrt, And CUDA
A Rust library integrated with ONNXRuntime, providing a collection of Computer Vison and Vision-Language models.
SCUDA is a GPU over IP bridge allowing GPUs on remote machines to be attached to CPU-only machines.
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
Multi-platform high-performance compute language extension for Rust.
A highly optimized LLM inference acceleration engine for Llama and its variants.
A high-throughput and memory-efficient inference and serving engine for LLMs
SGLang is a fast serving framework for large language models and vision language models.
📚Modern CUDA Learn Notes: 200+ Tensor/CUDA Cores Kernels🎉, HGEMM, FA2 via MMA and CuTe, 98~100% TFLOPS of cuBLAS/FA2.
DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including CUDA, x86 and ARMv9.
An acceleration library that supports arbitrary bit-width combinatorial quantization operations
模型部署白皮书(CUDA|ONNX|TensorRT|C++)🚀🚀🚀
Learning how to write "Less Slow" code in C++ 20, C 99, CUDA, PTX, & Assembly, from numerics & SIMD to coroutines, ranges, exception handling, networking and user-space IO