Statistics for topic cuda
RepositoryStats tracks 518,325 Github repositories, of these 561 are tagged with the cuda topic. The most common primary language for repositories using this topic is C++ (211). Other languages include: Python (132), Cuda (56), C (26), Jupyter Notebook (21), Rust (15), Dockerfile (12), Shell (11)
Stargazers over time for topic cuda
Most starred repositories for topic cuda (view more)
Trending repositories for topic cuda (view more)
A high-throughput and memory-efficient inference and serving engine for LLMs
Official implementation of "Time Evidence Fusion Network: Multi-source View in Long-Term Time Series Forecasting" (https://arxiv.org/abs/2405.06419)
Official implementation of "Time Evidence Fusion Network: Multi-source View in Long-Term Time Series Forecasting" (https://arxiv.org/abs/2405.06419)
🚀 TensorRT-YOLO: Support YOLOv5, YOLOv6, YOLOv7, YOLOv8, YOLOv9, PP-YOLOE using TensorRT acceleration with EfficientNMS!
OneSweep, implemented in CUDA, D3D12, and Unity style compute shaders. Theoretically portable to all wave/warp/subgroup sizes.
A high-throughput and memory-efficient inference and serving engine for LLMs
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
Official implementation of "Time Evidence Fusion Network: Multi-source View in Long-Term Time Series Forecasting" (https://arxiv.org/abs/2405.06419)
🚀 TensorRT-YOLO: Support YOLOv5, YOLOv6, YOLOv7, YOLOv8, YOLOv9, PP-YOLOE using TensorRT acceleration with EfficientNMS!
OneSweep, implemented in CUDA, D3D12, and Unity style compute shaders. Theoretically portable to all wave/warp/subgroup sizes.
Official implementation of "Time Evidence Fusion Network: Multi-source View in Long-Term Time Series Forecasting" (https://arxiv.org/abs/2405.06419)
A high-throughput and memory-efficient inference and serving engine for LLMs
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
Official implementation of "Time Evidence Fusion Network: Multi-source View in Long-Term Time Series Forecasting" (https://arxiv.org/abs/2405.06419)
🚀 TensorRT-YOLO: Support YOLOv5, YOLOv6, YOLOv7, YOLOv8, YOLOv9, PP-YOLOE using TensorRT acceleration with EfficientNMS!
OneSweep, implemented in CUDA, D3D12, and Unity style compute shaders. Theoretically portable to all wave/warp/subgroup sizes.
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
Drop-in, local AI alternative to the OpenAI stack. Multi-engine (llama.cpp, TensorRT-LLM). Powers 👋 Jan
Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
3D Gaussian Splatting, reimagined: Unleashing unmatched speed with C++ and CUDA from the ground up!
Cross-platform, customizable multimedia/video processing framework. With strong GPU acceleration, heterogeneous design, multi-language support, easy to use, multi-framework compatible and high perfor...
A high-throughput and memory-efficient inference and serving engine for LLMs
🎉CUDA 笔记 / 大模型手撕CUDA / C++笔记,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
Open source, local, and self-hosted highly optimized language inference server supporting ASR/STT, TTS, and LLM across WebRTC, REST, and WS
ONNX Runtime Server: The ONNX Runtime Server is a server that provides TCP and HTTP/HTTPS REST APIs for ONNX inference.