Trending repositories for topic gpu
Tensors and Dynamic neural networks in Python with strong GPU acceleration
dstack is an open-source alternative to Kubernetes, designed to simplify development, training, and deployment of AI across any cloud or on-prem. It supports NVIDIA, AMD, and TPU.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Lightweight Armoury Crate alternative for Asus laptops and ROG Ally. Control tool for ROG Zephyrus G14, G15, G16, M16, Flow X13, Flow X16, TUF, Strix, Scar and other models
This is the release repository for Fan Control, a highly customizable fan controlling software for Windows.
Run AI models locally on your machine with node.js bindings for llama.cpp. Enforce a JSON schema on the model output on the generation level
Mesh optimization library that makes meshes smaller and faster to render
NVIDIA GPU Operator creates, configures, and manages GPUs in Kubernetes
SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals
Extensible generative AI platform on Kubernetes with OpenAI-compatible APIs.
Extensible generative AI platform on Kubernetes with OpenAI-compatible APIs.
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
dstack is an open-source alternative to Kubernetes, designed to simplify development, training, and deployment of AI across any cloud or on-prem. It supports NVIDIA, AMD, and TPU.
Information and links about Epic's Unreal Engine including Verse programming language for UEFN, Unreal, Fortnite and the Metaverse along with UE5 and the UE6 convergence
LightGlue-OnnxRunner is a repository hosts the C++ inference code of LightGlue in ONNX format,supporting end-to-end/decouple model inference of SuperPoint/DISK + LightGlue
A Silent (Hidden) Free Crypto Miner Builder - Supports ETH, ETC, XMR and many more.
Run AI models locally on your machine with node.js bindings for llama.cpp. Enforce a JSON schema on the model output on the generation level
Cross-architecture parallel algorithms for Julia's GPU backends, from a unified KernelAbstractions.jl codebase. Targets Intel oneAPI, AMD ROCm, Apple Metal, Nvidia CUDA.
A distributed storage benchmark for file systems, object stores & block devices with support for GPUs
MF-LBM: A Portable, Scalable and High-performance Lattice Boltzmann Code for DNS of Flow in Porous Media
Go library for embedded vector search and semantic embeddings using llama.cpp
GLake: optimizing GPU memory management and IO transmission.
🦀⚙️ Sudoless performance monitoring for Apple Silicon processors. CPU / GPU / RAM usage, power consumption & temperature 🌡️
TypeScript library that enhances the WebGPU API, allowing resource management in a type-safe, declarative way.
Rapids_singlecell: A GPU-accelerated tool for scRNA analysis. Offers seamless scverse compatibility for efficient single-cell data processing and analysis.
🔥🔥🔥 A collection of some awesome public CUDA, cuBLAS, TensorRT and High Performance Computing (HPC) projects.
Tensors and Dynamic neural networks in Python with strong GPU acceleration
dstack is an open-source alternative to Kubernetes, designed to simplify development, training, and deployment of AI across any cloud or on-prem. It supports NVIDIA, AMD, and TPU.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Lightweight Armoury Crate alternative for Asus laptops and ROG Ally. Control tool for ROG Zephyrus G14, G15, G16, M16, Flow X13, Flow X16, TUF, Strix, Scar and other models
This is the release repository for Fan Control, a highly customizable fan controlling software for Windows.
Run AI models locally on your machine with node.js bindings for llama.cpp. Enforce a JSON schema on the model output on the generation level
Extensible generative AI platform on Kubernetes with OpenAI-compatible APIs.
Mesh optimization library that makes meshes smaller and faster to render
NVIDIA GPU Operator creates, configures, and manages GPUs in Kubernetes
SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals
Extensible generative AI platform on Kubernetes with OpenAI-compatible APIs.
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
Stable Diffusion WebUI Forge docker images for use in GPU cloud and local environments. Includes AI-Dock base for authentication and improved user experience.
dstack is an open-source alternative to Kubernetes, designed to simplify development, training, and deployment of AI across any cloud or on-prem. It supports NVIDIA, AMD, and TPU.
Information and links about Epic's Unreal Engine including Verse programming language for UEFN, Unreal, Fortnite and the Metaverse along with UE5 and the UE6 convergence
LightGlue-OnnxRunner is a repository hosts the C++ inference code of LightGlue in ONNX format,supporting end-to-end/decouple model inference of SuperPoint/DISK + LightGlue
A Silent (Hidden) Free Crypto Miner Builder - Supports ETH, ETC, XMR and many more.
Run AI models locally on your machine with node.js bindings for llama.cpp. Enforce a JSON schema on the model output on the generation level
Cross-architecture parallel algorithms for Julia's GPU backends, from a unified KernelAbstractions.jl codebase. Targets Intel oneAPI, AMD ROCm, Apple Metal, Nvidia CUDA.
A distributed storage benchmark for file systems, object stores & block devices with support for GPUs
MF-LBM: A Portable, Scalable and High-performance Lattice Boltzmann Code for DNS of Flow in Porous Media
Go library for embedded vector search and semantic embeddings using llama.cpp
🦀⚙️ Sudoless performance monitoring for Apple Silicon processors. CPU / GPU / RAM usage, power consumption & temperature 🌡️
TypeScript library that enhances the WebGPU API, allowing resource management in a type-safe, declarative way.
Rapids_singlecell: A GPU-accelerated tool for scRNA analysis. Offers seamless scverse compatibility for efficient single-cell data processing and analysis.
🔥🔥🔥 A collection of some awesome public CUDA, cuBLAS, TensorRT and High Performance Computing (HPC) projects.
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Lightweight Armoury Crate alternative for Asus laptops and ROG Ally. Control tool for ROG Zephyrus G14, G15, G16, M16, Flow X13, Flow X16, TUF, Strix, Scar and other models
Go library for embedded vector search and semantic embeddings using llama.cpp
This is the release repository for Fan Control, a highly customizable fan controlling software for Windows.
Best practices & guides on how to write distributed pytorch training code
John the Ripper jumbo - advanced offline password cracker, which supports hundreds of hash and cipher types, and runs on many operating systems, CPUs, GPUs, and even some FPGAs
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
Best practices & guides on how to write distributed pytorch training code
Go library for embedded vector search and semantic embeddings using llama.cpp
This is my experiments with BVH build algorithms on GPU.
🦀⚙️ Sudoless performance monitoring for Apple Silicon processors. CPU / GPU / RAM usage, power consumption & temperature 🌡️
Stable Diffusion WebUI Forge docker images for use in GPU cloud and local environments. Includes AI-Dock base for authentication and improved user experience.
A Silent (Hidden) Free Crypto Miner Builder - Supports ETH, ETC, XMR and many more.
BioNeMo Framework: For building and adapting AI models in drug discovery at scale
A 3D FPGA GPU for real-time rasterization with a tile-based deferred rendering (TBDR) architecture, featuring transform & lighting (T&L), back-face culling, MSAA anti-aliasing, ordered dithering, etc.
Cross-architecture parallel algorithms for Julia's GPU backends, from a unified KernelAbstractions.jl codebase. Targets Intel oneAPI, AMD ROCm, Apple Metal, Nvidia CUDA.
A collection of GTSAM factors and optimizers for point cloud SLAM
Unity super simple approach for GPU instanced grass (+ occlusion/frustum culling)
TypeScript library that enhances the WebGPU API, allowing resource management in a type-safe, declarative way.
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
An advanced guide to run Mac OS / OS X / macOS on QEMU/KVM with libvirtd/Virt-Manager. Includes various write-ups for deep customization.
Information and links about Epic's Unreal Engine including Verse programming language for UEFN, Unreal, Fortnite and the Metaverse along with UE5 and the UE6 convergence
A faster implementation of OpenCV-CUDA that uses OpenCV objects, and more!
Multi-platform high-performance compute language extension for Rust.
Run serverless workloads with fast cold starts on bare-metal servers, anywhere in the world
An innovative library for efficient LLM inference via low-bit quantization
Go library for embedded vector search and semantic embeddings using llama.cpp
Best practices & guides on how to write distributed pytorch training code
🦀⚙️ Sudoless performance monitoring for Apple Silicon processors. CPU / GPU / RAM usage, power consumption & temperature 🌡️
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
A fast communication-overlapping library for tensor parallelism on GPUs.
This project collects GPU benchmarks from various cloud providers and compares them to fixed per token costs. Use our tool for efficient LLM GPU selections and cost-effective AI models. LLM provider p...
☁️ VRAM for SDXL, AnimateDiff, and upscalers. Run your workflows on the cloud, from your local ComfyUI
NviWatch: A blazingly fast rust based TUI for managing and monitoring NVIDIA GPU processes
TypeScript library that enhances the WebGPU API, allowing resource management in a type-safe, declarative way.
A CUDA reimplementation of the line/plane odometry of LIO-SAM. A point cloud hash map (inspired by iVox of Faster-LIO) on GPU is used to accelerate 5-neighbour KNN search.
Transforms your CasADi functions into batchable JAX-compatible functions. By combining the power of CasADi with the flexibility of JAX, JAXADi enables the creation of efficient code that runs smoothly...
Tensors and Dynamic neural networks in Python with strong GPU acceleration
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Lightweight Armoury Crate alternative for Asus laptops and ROG Ally. Control tool for ROG Zephyrus G14, G15, G16, M16, Flow X13, Flow X16, TUF, Strix, Scar and other models
This is the release repository for Fan Control, a highly customizable fan controlling software for Windows.
SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
GPU & Accelerator process monitoring for AMD, Apple, Huawei, Intel, NVIDIA and Qualcomm
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such ...
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals
John the Ripper jumbo - advanced offline password cracker, which supports hundreds of hash and cipher types, and runs on many operating systems, CPUs, GPUs, and even some FPGAs
An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.
Real-time image and video processing library similar to GPUImage, with built-in beauty filters, achieving commercial-grade beauty effects. Written in C++11 and based on OpenGL/ES.
An innovative library for efficient LLM inference via low-bit quantization
A high-performance inference system for large language models, designed for production environments.
A 3D FPGA GPU for real-time rasterization with a tile-based deferred rendering (TBDR) architecture, featuring transform & lighting (T&L), back-face culling, MSAA anti-aliasing, ordered dithering, etc.
GLake: optimizing GPU memory management and IO transmission.
Go library for embedded vector search and semantic embeddings using llama.cpp
Transforms your CasADi functions into batchable JAX-compatible functions. By combining the power of CasADi with the flexibility of JAX, JAXADi enables the creation of efficient code that runs smoothly...
A lightweight 2D graphics library for rendering texts, geometries, and images with high-performance APIs that work across various platforms.
☁️ VRAM for SDXL, AnimateDiff, and upscalers. Run your workflows on the cloud, from your local ComfyUI
NviWatch: A blazingly fast rust based TUI for managing and monitoring NVIDIA GPU processes
Collection of best practices, reference architectures, model training examples and utilities to train large models on AWS.
Information and links about Epic's Unreal Engine including Verse programming language for UEFN, Unreal, Fortnite and the Metaverse along with UE5 and the UE6 convergence
GLIM: versatile and extensible range-based 3D localization and mapping framework
RAG (Retrieval-augmented generation) ChatBot that provides answers based on contextual information extracted from a collection of Markdown files.
AUTOMATIC1111 (A1111) Stable Diffusion Web UI docker images for use in GPU cloud and local environments. Includes AI-Dock base for authentication and improved user experience.
Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.