Trending repositories for topic gpu
Tensors and Dynamic neural networks in Python with strong GPU acceleration
BioNeMo Framework: For building and adapting AI models in drug discovery at scale
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Lightweight Armoury Crate alternative for Asus laptops and ROG Ally. Control tool for ROG Zephyrus G14, G15, G16, M16, Flow X13, Flow X16, TUF, Strix, Scar and other models
dstack is a lightweight, open-source alternative to Kubernetes & Slurm, simplifying AI container orchestration with multi-cloud & on-prem support. It natively supports NVIDIA, AMD, & TPU.
This is the release repository for Fan Control, a highly customizable fan controlling software for Windows.
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs and CPUs via OpenCL. Free for non-commercial use.
Open deep learning compiler stack for cpu, gpu and specialized accelerators
Mesh optimization library that makes meshes smaller and faster to render
🦀⚙️ Sudoless performance monitoring for Apple Silicon processors. CPU / GPU / RAM usage, power consumption & temperature 🌡️
BioNeMo Framework: For building and adapting AI models in drug discovery at scale
🦀⚙️ Sudoless performance monitoring for Apple Silicon processors. CPU / GPU / RAM usage, power consumption & temperature 🌡️
📒A small curated list of Awesome Diffusion Inference Papers with codes.
dstack is a lightweight, open-source alternative to Kubernetes & Slurm, simplifying AI container orchestration with multi-cloud & on-prem support. It natively supports NVIDIA, AMD, & TPU.
fdtd3d is an open source 1D, 2D, 3D FDTD electromagnetics solver with MPI, OpenMP and CUDA support for x64, ARM, ARM64, RISC-V, PowerPC, Wasm architectures
NVIDIA-accelerated packages for arm motion planning and control
RAG (Retrieval-augmented generation) ChatBot that provides answers based on contextual information extracted from a collection of Markdown files.
Run serverless GPU workloads with fast cold starts on bare-metal servers, anywhere in the world
CUDA implementation of Hierarchical Navigable Small World Graph algorithm
Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Server models.
☁️ VRAM for SDXL, AnimateDiff, and upscalers. Run your workflows on the cloud, from your local ComfyUI
Triton Model Navigator is an inference toolkit designed for optimizing and deploying Deep Learning models with a focus on NVIDIA GPUs.
Tensors and Dynamic neural networks in Python with strong GPU acceleration
BioNeMo Framework: For building and adapting AI models in drug discovery at scale
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Lightweight Armoury Crate alternative for Asus laptops and ROG Ally. Control tool for ROG Zephyrus G14, G15, G16, M16, Flow X13, Flow X16, TUF, Strix, Scar and other models
This is the release repository for Fan Control, a highly customizable fan controlling software for Windows.
dstack is a lightweight, open-source alternative to Kubernetes & Slurm, simplifying AI container orchestration with multi-cloud & on-prem support. It natively supports NVIDIA, AMD, & TPU.
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
GPU & Accelerator process monitoring for AMD, Apple, Huawei, Intel, NVIDIA and Qualcomm
SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
🦀⚙️ Sudoless performance monitoring for Apple Silicon processors. CPU / GPU / RAM usage, power consumption & temperature 🌡️
BioNeMo Framework: For building and adapting AI models in drug discovery at scale
A GUI tool to manage NVidia GPUs overclock, fans and power limit. Supports both Wayland and X11
🦀⚙️ Sudoless performance monitoring for Apple Silicon processors. CPU / GPU / RAM usage, power consumption & temperature 🌡️
Extensible generative AI platform on Kubernetes with OpenAI-compatible APIs.
PalmHill.BlazorChat is a chat application and API built with Blazor WebAssembly, SignalR, and WebAPI, featuring real-time LLM conversations, markdown support, customizable settings, and a responsive d...
📡 Deploy AI models and apps to Kubernetes without developing a hernia
TypeScript library that enhances the WebGPU API, allowing resource management in a type-safe, declarative way.
📒A small curated list of Awesome Diffusion Inference Papers with codes.
A Silent (Hidden) Free Crypto Miner Builder - Supports ETH, ETC, XMR and many more.
This repository contains a Vulkan Framework designed to enable developers to get up and running quickly for creating sample content and rapid prototyping. It is designed to be easy to build and have t...
Run serverless GPU workloads with fast cold starts on bare-metal servers, anywhere in the world
An advanced guide to run Mac OS / OS X / macOS on QEMU/KVM with libvirtd/Virt-Manager. Includes various write-ups for deep customization.
NVIDIA-accelerated packages for arm motion planning and control
Go library for embedded vector search and semantic embeddings using llama.cpp
A GUI tool to manage NVidia GPUs overclock, fans and power limit. Supports both Wayland and X11
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals
Go library for embedded vector search and semantic embeddings using llama.cpp
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Lightweight Armoury Crate alternative for Asus laptops and ROG Ally. Control tool for ROG Zephyrus G14, G15, G16, M16, Flow X13, Flow X16, TUF, Strix, Scar and other models
This is the release repository for Fan Control, a highly customizable fan controlling software for Windows.
dstack is a lightweight, open-source alternative to Kubernetes & Slurm, simplifying AI container orchestration with multi-cloud & on-prem support. It natively supports NVIDIA, AMD, & TPU.
John the Ripper jumbo - advanced offline password cracker, which supports hundreds of hash and cipher types, and runs on many operating systems, CPUs, GPUs, and even some FPGAs
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Go library for embedded vector search and semantic embeddings using llama.cpp
Extensible generative AI platform on Kubernetes with OpenAI-compatible APIs.
BioNeMo Framework: For building and adapting AI models in drug discovery at scale
A GUI tool to manage NVidia GPUs overclock, fans and power limit. Supports both Wayland and X11
Best practices & guides on how to write distributed pytorch training code
🦀⚙️ Sudoless performance monitoring for Apple Silicon processors. CPU / GPU / RAM usage, power consumption & temperature 🌡️
📡 Deploy AI models and apps to Kubernetes without developing a hernia
Stable Diffusion WebUI Forge docker images for use in GPU cloud and local environments. Includes AI-Dock base for authentication and improved user experience.
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
TypeScript library that enhances the WebGPU API, allowing resource management in a type-safe, declarative way.
A Silent (Hidden) Free Crypto Miner Builder - Supports ETH, ETC, XMR and many more.
Information and links about Epic's Unreal Engine including Verse programming language for UEFN, Unreal, Fortnite and the Metaverse along with UE5 and the UE6 convergence
📒A small curated list of Awesome Diffusion Inference Papers with codes.
Multi-platform high-performance compute language extension for Rust.
Multi-platform high-performance compute language extension for Rust.
Go library for embedded vector search and semantic embeddings using llama.cpp
🦀⚙️ Sudoless performance monitoring for Apple Silicon processors. CPU / GPU / RAM usage, power consumption & temperature 🌡️
Best practices & guides on how to write distributed pytorch training code
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
A fast communication-overlapping library for tensor parallelism on GPUs.
This project collects GPU benchmarks from various cloud providers and compares them to fixed per token costs. Use our tool for efficient LLM GPU selections and cost-effective AI models. LLM provider p...
☁️ VRAM for SDXL, AnimateDiff, and upscalers. Run your workflows on the cloud, from your local ComfyUI
NviWatch: A blazingly fast rust based TUI for managing and monitoring NVIDIA GPU processes
TypeScript library that enhances the WebGPU API, allowing resource management in a type-safe, declarative way.
A CUDA reimplementation of the line/plane odometry of LIO-SAM. A point cloud hash map (inspired by iVox of Faster-LIO) on GPU is used to accelerate 5-neighbour KNN search.
Transforms your CasADi functions into batchable JAX-compatible functions. By combining the power of CasADi with the flexibility of JAX, JAXADi enables the creation of efficient code that runs smoothly...
Raylib 100% GPU particles example in 3D. Uses compute shaders and is fully documented. Millions of particles at 60 fps on a laptop.
Tensors and Dynamic neural networks in Python with strong GPU acceleration
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Lightweight Armoury Crate alternative for Asus laptops and ROG Ally. Control tool for ROG Zephyrus G14, G15, G16, M16, Flow X13, Flow X16, TUF, Strix, Scar and other models
This is the release repository for Fan Control, a highly customizable fan controlling software for Windows.
SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
GPU & Accelerator process monitoring for AMD, Apple, Huawei, Intel, NVIDIA and Qualcomm
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such ...
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals
John the Ripper jumbo - advanced offline password cracker, which supports hundreds of hash and cipher types, and runs on many operating systems, CPUs, GPUs, and even some FPGAs
An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.
Real-time image and video processing library similar to GPUImage, with built-in beauty filters, achieving commercial-grade beauty effects. Written in C++11 and based on OpenGL/ES.
A 3D FPGA GPU for real-time rasterization with a tile-based deferred rendering (TBDR) architecture, featuring transform & lighting (T&L), back-face culling, MSAA anti-aliasing, ordered dithering, etc.
Go library for embedded vector search and semantic embeddings using llama.cpp
An innovative library for efficient LLM inference via low-bit quantization
Transforms your CasADi functions into batchable JAX-compatible functions. By combining the power of CasADi with the flexibility of JAX, JAXADi enables the creation of efficient code that runs smoothly...
☁️ VRAM for SDXL, AnimateDiff, and upscalers. Run your workflows on the cloud, from your local ComfyUI
NviWatch: A blazingly fast rust based TUI for managing and monitoring NVIDIA GPU processes
Collection of best practices, reference architectures, model training examples and utilities to train large models on AWS.
Information and links about Epic's Unreal Engine including Verse programming language for UEFN, Unreal, Fortnite and the Metaverse along with UE5 and the UE6 convergence
GLIM: versatile and extensible range-based 3D localization and mapping framework
RAG (Retrieval-augmented generation) ChatBot that provides answers based on contextual information extracted from a collection of Markdown files.
This is my experiments with BVH build algorithms on GPU.
Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.
A collection of GTSAM factors and optimizers for point cloud SLAM
PalmHill.BlazorChat is a chat application and API built with Blazor WebAssembly, SignalR, and WebAPI, featuring real-time LLM conversations, markdown support, customizable settings, and a responsive d...