Trending repositories for topic hpc
Making large AI models cheaper, faster and more accessible
Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild
The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs and CPUs via OpenCL. Free for non-commercial use.
A curated list of awesome high performance computing resources
Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sg...
Environment Modules: provides dynamic modification of a user's environment
Cloud-first free no-fee no-X uniX-like finite-element(ish) computational engineering tool
Xiao's CUDA Optimization Guide [Active Adding New Contents]
An efficient C++17 GPU numerical computing library with Python-like syntax
Cloud-first free no-fee no-X uniX-like finite-element(ish) computational engineering tool
A curated list of awesome high performance computing resources
Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild
Environment Modules: provides dynamic modification of a user's environment
Xiao's CUDA Optimization Guide [Active Adding New Contents]
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sg...
Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)
The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs and CPUs via OpenCL. Free for non-commercial use.
Lightweight, general, scalable C++ library for finite element methods
An efficient C++17 GPU numerical computing library with Python-like syntax
The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs and CPUs via OpenCL. Free for non-commercial use.
Making large AI models cheaper, faster and more accessible
Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild
A curated list of awesome high performance computing resources
Environment Modules: provides dynamic modification of a user's environment
Implementation of SYCL and C++ standard parallelism for CPUs and GPUs from all vendors: The independent, community-driven compiler for C++-based heterogeneous programming models. Lets applications ada...
Official development repository for SUNDIALS - a SUite of Nonlinear and DIfferential/ALgebraic equation Solvers. Pull requests are welcome for bug fixes and minor changes.
nndeploy是一款模型端到端部署框架。以多端推理以及基于有向无环图模型部署为基础,致力为用户提供跨平台、简单易用、高性能的模型部署体验。
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sg...
Lightweight, general, scalable C++ library for finite element methods
Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)
Xiao's CUDA Optimization Guide [Active Adding New Contents]
A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.
Lightweight fast function pipeline (DAG) creation in pure Python for scientific workflows 🕸️🧪
Xiao's CUDA Optimization Guide [Active Adding New Contents]
A curated list of awesome high performance computing resources
Cloud-first free no-fee no-X uniX-like finite-element(ish) computational engineering tool
Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild
Official development repository for SUNDIALS - a SUite of Nonlinear and DIfferential/ALgebraic equation Solvers. Pull requests are welcome for bug fixes and minor changes.
Pegasus Workflow Management System - Automate, recover, and debug scientific computations.
Environment Modules: provides dynamic modification of a user's environment
nndeploy是一款模型端到端部署框架。以多端推理以及基于有向无环图模型部署为基础,致力为用户提供跨平台、简单易用、高性能的模型部署体验。
chipStar is a tool for compiling and running HIP/CUDA on SPIR-V via OpenCL or Level Zero APIs.
A scientific software for the numerical simulation of seismic wave phenomena and earthquake dynamics
The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs and CPUs via OpenCL. Free for non-commercial use.
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sg...
The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs and CPUs via OpenCL. Free for non-commercial use.
Making large AI models cheaper, faster and more accessible
Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild
A flexible package manager that supports multiple versions, configurations, platforms, and compilers.
Lightweight, general, scalable C++ library for finite element methods
A curated list of awesome high performance computing resources
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sg...
Implementation of SYCL and C++ standard parallelism for CPUs and GPUs from all vendors: The independent, community-driven compiler for C++-based heterogeneous programming models. Lets applications ada...
nndeploy是一款模型端到端部署框架。以多端推理以及基于有向无环图模型部署为基础,致力为用户提供跨平台、简单易用、高性能的模型部署体验。
Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)
A Prometheus exporter and a REST API server to export metrics of compute units of resource managers like SLURM, Openstack, k8s, _etc_
A template for starting reproducible Python machine-learning projects with hardware acceleration. Find an example at https://github.com/CLAIRE-Labo/no-representation-no-trust
Sample-based Quantum Diagonalization: Classically postprocess noisy quantum samples to yield more accurate eigenvalue estimations.
SLATE is a distributed, GPU-accelerated, dense linear algebra library targetting current and upcoming high-performance computing (HPC) systems. It is developed as part of the U.S. Department of Energy...
Lightweight fast function pipeline (DAG) creation in pure Python for scientific workflows 🕸️🧪
An open collaborative repository for reproducible specifications of HPC benchmarks and cross site benchmarking environments
A curated list of awesome high performance computing resources
🔥🔥🔥 A collection of some awesome public CUDA, cuBLAS, TensorRT, TensorRT-LLM, Triton and High Performance Computing (HPC) projects.
A toolkit featured artificial intelligence × ab initio for computational chemistry research.
Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial
Parallel algorithms and data structures for tree-based adaptive mesh refinement (AMR) with arbitrary element shapes.
nndeploy是一款模型端到端部署框架。以多端推理以及基于有向无环图模型部署为基础,致力为用户提供跨平台、简单易用、高性能的模型部署体验。
Scheduler for sub-node tasks for HPC systems with batch scheduling
A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.
Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild
Sample-based Quantum Diagonalization: Classically postprocess noisy quantum samples to yield more accurate eigenvalue estimations.
Welcome to Peridynamic Laboratory (PeriLab), a powerful software solution designed for tackling Peridynamic problems.
Making large AI models cheaper, faster and more accessible
The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs and CPUs via OpenCL. Free for non-commercial use.
Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild
A flexible package manager that supports multiple versions, configurations, platforms, and compilers.
Implementation of SYCL and C++ standard parallelism for CPUs and GPUs from all vendors: The independent, community-driven compiler for C++-based heterogeneous programming models. Lets applications ada...
A curated list of awesome high performance computing resources
nndeploy是一款模型端到端部署框架。以多端推理以及基于有向无环图模型部署为基础,致力为用户提供跨平台、简单易用、高性能的模型部署体验。
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sg...
Lightweight, general, scalable C++ library for finite element methods
A basic user tool to execute simple docker containers in batch or interactive systems without root privileges.
Lightweight fast function pipeline (DAG) creation in pure Python for scientific workflows 🕸️🧪
Lightweight fast function pipeline (DAG) creation in pure Python for scientific workflows 🕸️🧪
A template for starting reproducible Python machine-learning projects with hardware acceleration. Find an example at https://github.com/CLAIRE-Labo/no-representation-no-trust
🔥🔥🔥 A collection of some awesome public CUDA, cuBLAS, TensorRT, TensorRT-LLM, Triton and High Performance Computing (HPC) projects.
Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild
This repository containts materials for End-to-End AI for Science
N-Ways to GPU Programming Bootcamp
A Cross-Platform, Multi-Cloud High-Performance Computing Platform
Slides, exercises and resources for the 2023-2024 course "High Performance Computing" under the "Scientific and Data-Intensive Computing" Naster Program at University of Trieste
A curated list of awesome high performance computing resources
nndeploy是一款模型端到端部署框架。以多端推理以及基于有向无环图模型部署为基础,致力为用户提供跨平台、简单易用、高性能的模型部署体验。
SLATE is a distributed, GPU-accelerated, dense linear algebra library targetting current and upcoming high-performance computing (HPC) systems. It is developed as part of the U.S. Department of Energy...
A small OpenCL benchmark program to measure peak GPU/CPU performance.
chipStar is a tool for compiling and running HIP/CUDA on SPIR-V via OpenCL or Level Zero APIs.
CUDA and C++ port of BELLHOP / BELLHOP3D underwater acoustics simulator