Statistics for topic cuda
RepositoryStats tracks 595,858 Github repositories, of these 652 are tagged with the cuda topic. The most common primary language for repositories using this topic is C++ (236). Other languages include: Python (147), Cuda (77), C (29), Jupyter Notebook (26), Rust (19), Dockerfile (14), Shell (13)
Stargazers over time for topic cuda
Most starred repositories for topic cuda (view more)
Trending repositories for topic cuda (view more)
A high-throughput and memory-efficient inference and serving engine for LLMs
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
📚150+ Tensor/CUDA Cores Kernels, ⚡️flash-attention-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS 🎉🎉).
Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O
SplatAD: Real-Time Lidar and Camera Rendering with 3D Gaussian Splatting for Autonomous Driving
High-Performance Cross-Platform Monte Carlo Renderer Based on LuisaCompute
A high-throughput and memory-efficient inference and serving engine for LLMs
Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
SGLang is a fast serving framework for large language models and vision language models.
📚150+ Tensor/CUDA Cores Kernels, ⚡️flash-attention-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS 🎉🎉).
A highly optimized LLM inference acceleration engine for Llama and its variants.
Unbiased & physically-based GPU HIPRT (C++/HIP) interactive path tracing renderer
A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.
A highly optimized LLM inference acceleration engine for Llama and its variants.
SplatAD: Real-Time Lidar and Camera Rendering with 3D Gaussian Splatting for Autonomous Driving
⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA PTX and CuTe API (Write for Fun 👀~)
A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.
A high-throughput and memory-efficient inference and serving engine for LLMs
SGLang is a fast serving framework for large language models and vision language models.
A highly optimized LLM inference acceleration engine for Llama and its variants.
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
State of the art sorting and segmented sorting, including OneSweep. Implemented in CUDA, D3D12, and Unity style compute shaders. Theoretically portable to all wave/warp/subgroup sizes.
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
SGLang is a fast serving framework for large language models and vision language models.
🚀 你的YOLO部署神器。TensorRT Plugin、CUDA Kernel、CUDA Graphs三管齐下,享受闪电般的推理速度。| Your YOLO Deployment Powerhouse. With the synergy of TensorRT Plugins, CUDA Kernels, and CUDA Graphs, experience lightning-fast i...
Multi-platform high-performance compute language extension for Rust.
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
A high-throughput and memory-efficient inference and serving engine for LLMs
SGLang is a fast serving framework for large language models and vision language models.
📚150+ Tensor/CUDA Cores Kernels, ⚡️flash-attention-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS 🎉🎉).
SGLang is a fast serving framework for large language models and vision language models.