Trending repositories for topic hpc
Making large AI models cheaper, faster and more accessible
The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs via OpenCL.
A flexible package manager that supports multiple versions, configurations, platforms, and compilers.
nndeploy是一款模型端到端部署框架。以多端推理以及基于有向无环图模型部署为基础,致力为用户提供跨平台、简单易用、高性能的模型部署体验。
A curated list of awesome high performance computing resources
Learning and practice of high performance computing (CUDA, Vulkan, OpenCL, OpenMP, TBB, SSE/AVX, NEON, MPI, coroutines, etc. )
A small OpenCL benchmark program to measure peak GPU/CPU performance.
quacc is a flexible platform for computational materials science and quantum chemistry that is built for the big data era.
Official development repository for SUNDIALS - a SUite of Nonlinear and DIfferential/ALgebraic equation Solvers. Pull requests are welcome for bug fixes and minor changes.
SingularityCE is the Community Edition of Singularity, an open source container platform designed to be simple, fast, and secure.
Learning and practice of high performance computing (CUDA, Vulkan, OpenCL, OpenMP, TBB, SSE/AVX, NEON, MPI, coroutines, etc. )
A small OpenCL benchmark program to measure peak GPU/CPU performance.
quacc is a flexible platform for computational materials science and quantum chemistry that is built for the big data era.
nndeploy是一款模型端到端部署框架。以多端推理以及基于有向无环图模型部署为基础,致力为用户提供跨平台、简单易用、高性能的模型部署体验。
A curated list of awesome high performance computing resources
The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs via OpenCL.
Caliper is an instrumentation and performance profiling library
Official development repository for SUNDIALS - a SUite of Nonlinear and DIfferential/ALgebraic equation Solvers. Pull requests are welcome for bug fixes and minor changes.
SingularityCE is the Community Edition of Singularity, an open source container platform designed to be simple, fast, and secure.
Pythonic tool for orchestrating machine-learning/high performance/quantum-computing workflows in heterogeneous compute environments.
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sg...
A flexible package manager that supports multiple versions, configurations, platforms, and compilers.
Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)
Making large AI models cheaper, faster and more accessible
The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs via OpenCL.
A flexible package manager that supports multiple versions, configurations, platforms, and compilers.
nndeploy是一款模型端到端部署框架。以多端推理以及基于有向无环图模型部署为基础,致力为用户提供跨平台、简单易用、高性能的模型部署体验。
Implementation of SYCL and C++ standard parallelism for CPUs and GPUs from all vendors: The independent, community-driven compiler for C++-based heterogeneous programming models. Lets applications ada...
A basic user tool to execute simple docker containers in batch or interactive systems without root privileges.
A curated list of awesome high performance computing resources
Pythonic tool for orchestrating machine-learning/high performance/quantum-computing workflows in heterogeneous compute environments.
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sg...
Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)
N-Ways to GPU Programming Bootcamp
This repository containts materials for End-to-End AI for Science
Learning and practice of high performance computing (CUDA, Vulkan, OpenCL, OpenMP, TBB, SSE/AVX, NEON, MPI, coroutines, etc. )
This is a mirror of https://gitlab.inria.fr/starpu/starpu where our development happens, but contributions are welcome here too!
nndeploy是一款模型端到端部署框架。以多端推理以及基于有向无环图模型部署为基础,致力为用户提供跨平台、简单易用、高性能的模型部署体验。
quacc is a flexible platform for computational materials science and quantum chemistry that is built for the big data era.
The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs via OpenCL.
A curated list of awesome high performance computing resources
OCI-compatible engine to deploy Linux containers on HPC environments.
BIDScoin converts your source-level neuroimaging data to BIDS
A small OpenCL benchmark program to measure peak GPU/CPU performance.
Implementation of SYCL and C++ standard parallelism for CPUs and GPUs from all vendors: The independent, community-driven compiler for C++-based heterogeneous programming models. Lets applications ada...
Pythonic tool for orchestrating machine-learning/high performance/quantum-computing workflows in heterogeneous compute environments.
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sg...
Making large AI models cheaper, faster and more accessible
The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs via OpenCL.
A flexible package manager that supports multiple versions, configurations, platforms, and compilers.
A curated list of awesome high performance computing resources
Lightweight, general, scalable C++ library for finite element methods
:boom::computer::boom: A data-parallel functional programming language
Implementation of SYCL and C++ standard parallelism for CPUs and GPUs from all vendors: The independent, community-driven compiler for C++-based heterogeneous programming models. Lets applications ada...
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sg...
A basic user tool to execute simple docker containers in batch or interactive systems without root privileges.
A small OpenCL benchmark program to measure peak GPU/CPU performance.
nndeploy是一款模型端到端部署框架。以多端推理以及基于有向无环图模型部署为基础,致力为用户提供跨平台、简单易用、高性能的模型部署体验。
A small OpenCL benchmark program to measure peak GPU/CPU performance.
N-Ways to GPU Programming Bootcamp
This repository containts materials for End-to-End AI for Science
🧮 An Open Source, Parallel and Heterogeneous Finite Element Analysis Framework
A curated list of awesome high performance computing resources
Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial
Notes and tutorials on Density Functional Theory calculation using Quantum Espresso.
Parallel algorithms and data structures for tree-based AMR with arbitrary element shapes.
This is a mirror of https://gitlab.inria.fr/starpu/starpu where our development happens, but contributions are welcome here too!
Cloud-first free no-fee no-X uniX-like finite-element(ish) computational engineering tool
The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs via OpenCL.
The PennyLane-Lightning plugin provides a fast state-vector simulator written in C++ for use with PennyLane
quacc is a flexible platform for computational materials science and quantum chemistry that is built for the big data era.
nndeploy是一款模型端到端部署框架。以多端推理以及基于有向无环图模型部署为基础,致力为用户提供跨平台、简单易用、高性能的模型部署体验。
🪐 The Sebulba architecture to scale reinforcement learning on Cloud TPUs in JAX
Making large AI models cheaper, faster and more accessible
The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs via OpenCL.
A flexible package manager that supports multiple versions, configurations, platforms, and compilers.
nndeploy是一款模型端到端部署框架。以多端推理以及基于有向无环图模型部署为基础,致力为用户提供跨平台、简单易用、高性能的模型部署体验。
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sg...
An efficient C++17 GPU numerical computing library with Python-like syntax
Implementation of SYCL and C++ standard parallelism for CPUs and GPUs from all vendors: The independent, community-driven compiler for C++-based heterogeneous programming models. Lets applications ada...
A curated list of awesome high performance computing resources
Lightweight, general, scalable C++ library for finite element methods
Pythonic tool for orchestrating machine-learning/high performance/quantum-computing workflows in heterogeneous compute environments.
:boom::computer::boom: A data-parallel functional programming language
SLATE is a distributed, GPU-accelerated, dense linear algebra library targetting current and upcoming high-performance computing (HPC) systems. It is developed as part of the U.S. Department of Energy...
This repository containts materials for End-to-End AI for Science
CUDA and C++ port of BELLHOP / BELLHOP3D underwater acoustics simulator
quacc is a flexible platform for computational materials science and quantum chemistry that is built for the big data era.
A curated list of awesome high performance computing resources
A small OpenCL benchmark program to measure peak GPU/CPU performance.
N-Ways to GPU Programming Bootcamp
Run GPU inference and training jobs on serverless infrastructure that scales with you.
Parallelo Parallel Library (PPL) is a small parallel framework that brings Structured Parallel Programming in Rust.
A toolkit featured artificial intelligence × ab initio for computational chemistry research.
Supercomputing @ GT has compiled a list of organizations that offer internships and experiences in HPC and applications of HPC.
Xiao's CUDA Optimization Guide [Active Adding New Contents]
chipStar is a tool for compiling and running HIP/CUDA on SPIR-V via OpenCL or Level Zero APIs.
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sg...
You should offer both Podman and Apptainer with name spaces on your HPC systems