Statistics for topic cuda
RepositoryStats tracks 579,129 Github repositories, of these 629 are tagged with the cuda topic. The most common primary language for repositories using this topic is C++ (227). Other languages include: Python (144), Cuda (73), C (27), Jupyter Notebook (26), Rust (19), Dockerfile (13), Shell (13)
Stargazers over time for topic cuda
Most starred repositories for topic cuda (view more)
Trending repositories for topic cuda (view more)
A high-throughput and memory-efficient inference and serving engine for LLMs
SGLang is a fast serving framework for large language models and vision language models.
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
Run AI models locally on your machine with node.js bindings for llama.cpp. Enforce a JSON schema on the model output on the generation level
PyTorch native quantization and sparsity for training and inference
Efficient CUDA kernels for training convolutional neural networks with PyTorch.
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
Unbiased & physically-based GPU HIPRT (C++/HIP) interactive path tracing renderer
PyTorch half precision gemm lib w/ fused optional bias + optional relu/gelu
Efficient CUDA kernels for training convolutional neural networks with PyTorch.
A high-throughput and memory-efficient inference and serving engine for LLMs
Efficient CUDA kernels for training convolutional neural networks with PyTorch.
SGLang is a fast serving framework for large language models and vision language models.
Run AI models locally on your machine with node.js bindings for llama.cpp. Enforce a JSON schema on the model output on the generation level
PyTorch native quantization and sparsity for training and inference
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
Unbiased & physically-based GPU HIPRT (C++/HIP) interactive path tracing renderer
PyTorch half precision gemm lib w/ fused optional bias + optional relu/gelu
Cross-architecture parallel algorithms for Julia's GPU backends, from a unified KernelAbstractions.jl codebase. Targets Intel oneAPI, AMD ROCm, Apple Metal, Nvidia CUDA.
Efficient CUDA kernels for training convolutional neural networks with PyTorch.
A high-throughput and memory-efficient inference and serving engine for LLMs
SGLang is a fast serving framework for large language models and vision language models.
Best practices & guides on how to write distributed pytorch training code
Best practices & guides on how to write distributed pytorch training code
YoloDotNet - A C# .NET 8.0 project for Classification, Object Detection, OBB Detection, Segmentation and Pose Estimation in both images and videos.
SGLang is a fast serving framework for large language models and vision language models.
Multi-platform high-performance compute language extension for Rust.
🚀 你的YOLO部署神器。TensorRT Plugin、CUDA Kernel、CUDA Graphs三管齐下,享受闪电般的推理速度。| Your YOLO Deployment Powerhouse. With the synergy of TensorRT Plugins, CUDA Kernels, and CUDA Graphs, experience lightning-fast i...
Run serverless workloads with fast cold starts on bare-metal servers, anywhere in the world
A high-throughput and memory-efficient inference and serving engine for LLMs
SGLang is a fast serving framework for large language models and vision language models.
PyTorch native quantization and sparsity for training and inference
🎉 Modern CUDA Learn Notes with PyTorch: CUDA Cores, Tensor Cores, fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, hgemm, sgemv, warp/block reduce, elementwise, softmax, layernorm, rmsnorm.
SGLang is a fast serving framework for large language models and vision language models.