Trending repositories for topic high-performance-computing
The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs and CPUs via OpenCL. Free for non-commercial use.
Implementation of SYCL and C++ standard parallelism for CPUs and GPUs from all vendors: The independent, community-driven compiler for C++-based heterogeneous programming models. Lets applications ada...
High-performance TensorFlow library for quantitative finance.
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sg...
Kokkos C++ Performance Portability Programming Ecosystem: The Programming Model - Parallel Execution and Memory Abstraction
BS::thread_pool: a fast, lightweight, modern, and easy-to-use C++17 / C++20 / C++23 thread pool library
A list of awesome compiler projects and papers for tensor computation and deep learning.
Implementation of SYCL and C++ standard parallelism for CPUs and GPUs from all vendors: The independent, community-driven compiler for C++-based heterogeneous programming models. Lets applications ada...
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sg...
The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs and CPUs via OpenCL. Free for non-commercial use.
Kokkos C++ Performance Portability Programming Ecosystem: The Programming Model - Parallel Execution and Memory Abstraction
BS::thread_pool: a fast, lightweight, modern, and easy-to-use C++17 / C++20 / C++23 thread pool library
A list of awesome compiler projects and papers for tensor computation and deep learning.
High-performance TensorFlow library for quantitative finance.
A General-purpose Task-parallel Programming System using Modern C++
Implementation of SYCL and C++ standard parallelism for CPUs and GPUs from all vendors: The independent, community-driven compiler for C++-based heterogeneous programming models. Lets applications ada...
The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs and CPUs via OpenCL. Free for non-commercial use.
A list of awesome compiler projects and papers for tensor computation and deep learning.
Kokkos C++ Performance Portability Programming Ecosystem: The Programming Model - Parallel Execution and Memory Abstraction
BS::thread_pool: a fast, lightweight, modern, and easy-to-use C++17 / C++20 / C++23 thread pool library
High-performance TensorFlow library for quantitative finance.
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sg...
Geant4 toolkit for the simulation of the passage of particles through matter - NIM A 506 (2003) 250-303
A parallel, extensible finite element code to simulate convection in both 2D and 3D models.
A curated list of awesome projects and papers for distributed training or inference
Implementation of SYCL and C++ standard parallelism for CPUs and GPUs from all vendors: The independent, community-driven compiler for C++-based heterogeneous programming models. Lets applications ada...
A parallel, extensible finite element code to simulate convection in both 2D and 3D models.
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sg...
Geant4 toolkit for the simulation of the passage of particles through matter - NIM A 506 (2003) 250-303
Kokkos C++ Performance Portability Programming Ecosystem: The Programming Model - Parallel Execution and Memory Abstraction
A list of awesome compiler projects and papers for tensor computation and deep learning.
A curated list of awesome projects and papers for distributed training or inference
The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs and CPUs via OpenCL. Free for non-commercial use.
BS::thread_pool: a fast, lightweight, modern, and easy-to-use C++17 / C++20 / C++23 thread pool library
SPHinXsys provides C++ APIs for engineering simulation and optimization. It aims at complex systems driven by fluid, structure, multi-body dynamics and beyond. The multi-physics library is based on a ...
High-performance TensorFlow library for quantitative finance.
The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs and CPUs via OpenCL. Free for non-commercial use.
Implementation of SYCL and C++ standard parallelism for CPUs and GPUs from all vendors: The independent, community-driven compiler for C++-based heterogeneous programming models. Lets applications ada...
High-performance TensorFlow library for quantitative finance.
BS::thread_pool: a fast, lightweight, modern, and easy-to-use C++17 / C++20 / C++23 thread pool library
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sg...
Lightweight, general, scalable C++ library for finite element methods
Kokkos C++ Performance Portability Programming Ecosystem: The Programming Model - Parallel Execution and Memory Abstraction
A list of awesome compiler projects and papers for tensor computation and deep learning.
SPHinXsys provides C++ APIs for engineering simulation and optimization. It aims at complex systems driven by fluid, structure, multi-body dynamics and beyond. The multi-physics library is based on a ...
Official development repository for SUNDIALS - a SUite of Nonlinear and DIfferential/ALgebraic equation Solvers. Pull requests are welcome for bug fixes and minor changes.
Geant4 toolkit for the simulation of the passage of particles through matter - NIM A 506 (2003) 250-303
A code for fast, massively-parallel of two-phase flows with heat transfer
Open source digital rocks software platform for micro-CT, CT, thin sections and borehole image analysis. Includes tools for: annotation, AI, HPC, porous media flow simulation, porosity analysis, perme...
SaunaFS is a free-and open source, distributed POSIX file system inspired by Google File System.
Parallel algorithms and data structures for tree-based adaptive mesh refinement (AMR) with arbitrary element shapes.
Mt-KaHyPar (Multi-Threaded Karlsruhe Hypergraph Partitioner) is a shared-memory multilevel graph and hypergraph partitioner equipped with parallel implementations of techniques used in the best sequen...
A multi-block solver for massively parallel direct numerical simulations (DNS) of fluid flows
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sg...
SPHinXsys provides C++ APIs for engineering simulation and optimization. It aims at complex systems driven by fluid, structure, multi-body dynamics and beyond. The multi-physics library is based on a ...
Implementation of SYCL and C++ standard parallelism for CPUs and GPUs from all vendors: The independent, community-driven compiler for C++-based heterogeneous programming models. Lets applications ada...
C++ library and command-line software for processing and analysis of terabyte-scale volume images locally or on a computing cluster.
The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs and CPUs via OpenCL. Free for non-commercial use.
A small OpenCL benchmark program to measure peak GPU/CPU performance.
This is the official github mirror repository of FrontISTR, Open-Source Large-Scale Parallel FEM Program for Nonlinear Structural Analysis. Active developments of FrontISTR are hosted on https://gitl...
A parallel, extensible finite element code to simulate convection in both 2D and 3D models.
Main repository for QMCPACK, an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids with full performance p...
A Taichi-powered high-performance numerical simulator for multiscale and multifield geophysical problems
SaunaFS is a free-and open source, distributed POSIX file system inspired by Google File System.
Open source digital rocks software platform for micro-CT, CT, thin sections and borehole image analysis. Includes tools for: annotation, AI, HPC, porous media flow simulation, porosity analysis, perme...
The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs and CPUs via OpenCL. Free for non-commercial use.
A General-purpose Task-parallel Programming System using Modern C++
Implementation of SYCL and C++ standard parallelism for CPUs and GPUs from all vendors: The independent, community-driven compiler for C++-based heterogeneous programming models. Lets applications ada...
BS::thread_pool: a fast, lightweight, modern, and easy-to-use C++17 / C++20 / C++23 thread pool library
High-performance TensorFlow library for quantitative finance.
Kokkos C++ Performance Portability Programming Ecosystem: The Programming Model - Parallel Execution and Memory Abstraction
A list of awesome compiler projects and papers for tensor computation and deep learning.
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sg...
Lightweight, general, scalable C++ library for finite element methods
Training and serving large-scale neural networks with auto parallelization.
A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends
A modern, fast, lightweight thread pool library based on C++20
GAL-DAWN: An Novel High performance computing Library of Graph Algorithms based on DAWN, CUDA/C++
SaunaFS is a free-and open source, distributed POSIX file system inspired by Google File System.
A code for fast, massively-parallel of two-phase flows with heat transfer
Open source digital rocks software platform for micro-CT, CT, thin sections and borehole image analysis. Includes tools for: annotation, AI, HPC, porous media flow simulation, porosity analysis, perme...
A small OpenCL benchmark program to measure peak GPU/CPU performance.
Parallel algorithms and data structures for tree-based adaptive mesh refinement (AMR) with arbitrary element shapes.
crew launcher plugins for traditional high-performance computing clusters
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sg...
An ongoing & curated collection of awesome software best practices and techniques, libraries and frameworks, E-books and videos, websites, blog posts, links to github Repositories, technical guideline...
Supercomputing @ GT has compiled a list of organizations that offer internships and experiences in HPC and applications of HPC.
Mt-KaHyPar (Multi-Threaded Karlsruhe Hypergraph Partitioner) is a shared-memory multilevel graph and hypergraph partitioner equipped with parallel implementations of techniques used in the best sequen...
High-performance and differentiation-enabled nonlinear solvers (Newton methods), bracketed rootfinding (bisection, Falsi), with sparsity and Newton-Krylov support.
CP3d is a comprehensive Euler-Lagrange solver for the direct numerical simulations of particle-laden flows.
Implementation of SYCL and C++ standard parallelism for CPUs and GPUs from all vendors: The independent, community-driven compiler for C++-based heterogeneous programming models. Lets applications ada...
RabbitTClust: enabling fast clustering analysis of millions bacteria genomes with MinHash sketches