Trending repositories for topic parallel-computing
Kokkos C++ Performance Portability Programming Ecosystem: The Programming Model - Parallel Execution and Memory Abstraction
Must read research papers and links to tools and datasets that are related to using machine learning for compilers and systems optimisation
Symbolic programming for the next generation of numerical software
Lightweight fast function pipeline (DAG) creation in pure Python for scientific workflows 🕸️🧪
Xiao's CUDA Optimization Guide [Active Adding New Contents]
Kratos Multiphysics (A.K.A Kratos) is a framework for building parallel multi-disciplinary simulation software. Modularity, extensibility and HPC are the main objectives. Kratos has BSD license and is...
:chart_with_upwards_trend: Adaptive: parallel active learning of mathematical functions
Lightweight, general, scalable C++ library for finite element methods
Evolutionary algorithm toolbox and framework with high performance for Python
Build applications, scripts, and automations powered by high-performance multicore computing using Node.js
Lightweight fast function pipeline (DAG) creation in pure Python for scientific workflows 🕸️🧪
Xiao's CUDA Optimization Guide [Active Adding New Contents]
Must read research papers and links to tools and datasets that are related to using machine learning for compilers and systems optimisation
Kokkos C++ Performance Portability Programming Ecosystem: The Programming Model - Parallel Execution and Memory Abstraction
Symbolic programming for the next generation of numerical software
A General-purpose Task-parallel Programming System using Modern C++
Kratos Multiphysics (A.K.A Kratos) is a framework for building parallel multi-disciplinary simulation software. Modularity, extensibility and HPC are the main objectives. Kratos has BSD license and is...
:chart_with_upwards_trend: Adaptive: parallel active learning of mathematical functions
Lightweight, general, scalable C++ library for finite element methods
Evolutionary algorithm toolbox and framework with high performance for Python
Build applications, scripts, and automations powered by high-performance multicore computing using Node.js
Kokkos C++ Performance Portability Programming Ecosystem: The Programming Model - Parallel Execution and Memory Abstraction
Must read research papers and links to tools and datasets that are related to using machine learning for compilers and systems optimisation
Lightweight, general, scalable C++ library for finite element methods
Lightweight fast function pipeline (DAG) creation in pure Python for scientific workflows 🕸️🧪
Xiao's CUDA Optimization Guide [Active Adding New Contents]
Kratos Multiphysics (A.K.A Kratos) is a framework for building parallel multi-disciplinary simulation software. Modularity, extensibility and HPC are the main objectives. Kratos has BSD license and is...
:chart_with_upwards_trend: Adaptive: parallel active learning of mathematical functions
🔥🔥🔥 A collection of some awesome public CUDA, cuBLAS, TensorRT, TensorRT-LLM, Triton and High Performance Computing (HPC) projects.
PyTorch implementation of Soft-Actor-Critic and Prioritized Experience Replay (PER) + Emphasizing Recent Experience (ERE) + Munchausen RL + D2RL and parallel Environments.
Create and control multiple Julia processes remotely for distributed computing. Ships as a Julia stdlib.
Parallel, highly efficient code (CPU and GPU) for DEM and CFD-DEM simulations.
Lightweight fast function pipeline (DAG) creation in pure Python for scientific workflows 🕸️🧪
Xiao's CUDA Optimization Guide [Active Adding New Contents]
🔥🔥🔥 A collection of some awesome public CUDA, cuBLAS, TensorRT, TensorRT-LLM, Triton and High Performance Computing (HPC) projects.
This is the official github mirror repository of FrontISTR, Open-Source Large-Scale Parallel FEM Program for Nonlinear Structural Analysis. Active developments of FrontISTR are hosted on https://gitl...
Location for the LSF Python wrapper for controlling all things LSF
Mt-KaHyPar (Multi-Threaded Karlsruhe Hypergraph Partitioner) is a shared-memory multilevel graph and hypergraph partitioner equipped with parallel implementations of techniques used in the best sequen...
PyTorch implementation of Soft-Actor-Critic and Prioritized Experience Replay (PER) + Emphasizing Recent Experience (ERE) + Munchausen RL + D2RL and parallel Environments.
C and Python examples from my book on using PETSc and Firedrake to solve PDEs
Digital Image Correlation & Digital Volume Correlation Library
Kokkos C++ Performance Portability Programming Ecosystem: The Programming Model - Parallel Execution and Memory Abstraction
Lightweight, general, scalable C++ library for finite element methods
Kokkos C++ Performance Portability Programming Ecosystem: The Programming Model - Parallel Execution and Memory Abstraction
miniRT is the final C project of the 42 Common Core: our very first ray-tracer. Our miniRT focused on optimising CPU-rendered graphics, to achieve a real-time renderer with movement controls and extra...
Lightweight fast function pipeline (DAG) creation in pure Python for scientific workflows 🕸️🧪
Must read research papers and links to tools and datasets that are related to using machine learning for compilers and systems optimisation
Official development repository for SUNDIALS - a SUite of Nonlinear and DIfferential/ALgebraic equation Solvers. Pull requests are welcome for bug fixes and minor changes.
Kratos Multiphysics (A.K.A Kratos) is a framework for building parallel multi-disciplinary simulation software. Modularity, extensibility and HPC are the main objectives. Kratos has BSD license and is...
A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner
miniRT is the final C project of the 42 Common Core: our very first ray-tracer. Our miniRT focused on optimising CPU-rendered graphics, to achieve a real-time renderer with movement controls and extra...
Parallel, highly efficient code (CPU and GPU) for DEM and CFD-DEM simulations.
Fierro is a C++ code designed to aid the research and development of numerical methods, testing of user-specified models, and creating multi-scale models related to quasi-static solid mechanics and co...
Lightweight fast function pipeline (DAG) creation in pure Python for scientific workflows 🕸️🧪
🔥🔥🔥 A collection of some awesome public CUDA, cuBLAS, TensorRT, TensorRT-LLM, Triton and High Performance Computing (HPC) projects.
Create and control multiple Julia processes remotely for distributed computing. Ships as a Julia stdlib.
Propulate is an asynchronous population-based optimization algorithm and software package for global optimization and hyperparameter search on high-performance computers.
Parallel algorithms and data structures for tree-based adaptive mesh refinement (AMR) with arbitrary element shapes.
Mt-KaHyPar (Multi-Threaded Karlsruhe Hypergraph Partitioner) is a shared-memory multilevel graph and hypergraph partitioner equipped with parallel implementations of techniques used in the best sequen...
A visual Deep Learning Framework for the Web - Built with WebGPU, Next.js and ReactFlow.
Build applications, scripts, and automations powered by high-performance multicore computing using Node.js
University of Toronto / ECE1782 - Programming Massively Parallel Multiprocessors and Heterogeneous Systems / Project: an optimized CUDA Implementation of AES 128-bit Encryption, support any file types...
miniRT is the final C project of the 42 Common Core: our very first ray-tracer. Our miniRT focused on optimising CPU-rendered graphics, to achieve a real-time renderer with movement controls and extra...
Build applications, scripts, and automations powered by high-performance multicore computing using Node.js
A General-purpose Task-parallel Programming System using Modern C++
Kokkos C++ Performance Portability Programming Ecosystem: The Programming Model - Parallel Execution and Memory Abstraction
Lightweight, general, scalable C++ library for finite element methods
A bleeding-edge, lock-free, wait-free, continuation-stealing tasking library built on C++20's coroutines
An easy-to-use and fast library for task-based parallelism, utilizing coroutines.
Must read research papers and links to tools and datasets that are related to using machine learning for compilers and systems optimisation
Lightweight fast function pipeline (DAG) creation in pure Python for scientific workflows 🕸️🧪
Evolutionary algorithm toolbox and framework with high performance for Python
A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends
A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner
Lightweight fast function pipeline (DAG) creation in pure Python for scientific workflows 🕸️🧪
runs multiple inputs through a script/function in parallel using bash coprocs
Create and control multiple Julia processes remotely for distributed computing. Ships as a Julia stdlib.
Light and self-contained implementation of C++17 parallel algorithms.
An easy-to-use and fast library for task-based parallelism, utilizing coroutines.
🔥🔥🔥 A collection of some awesome public CUDA, cuBLAS, TensorRT, TensorRT-LLM, Triton and High Performance Computing (HPC) projects.
Parallel, highly efficient code (CPU and GPU) for DEM and CFD-DEM simulations.
miniRT is the final C project of the 42 Common Core: our very first ray-tracer. Our miniRT focused on optimising CPU-rendered graphics, to achieve a real-time renderer with movement controls and extra...
Slides, exercises and resources for the 2023-2024 course "High Performance Computing" under the "Scientific and Data-Intensive Computing" Naster Program at University of Trieste
Fierro is a C++ code designed to aid the research and development of numerical methods, testing of user-specified models, and creating multi-scale models related to quasi-static solid mechanics and co...
Digital Image Correlation & Digital Volume Correlation Library
Code implementing "Efficient Parallelization of a Ubiquitious Sequential Computation" (Heinsen, 2023)
Parallel algorithms and data structures for tree-based adaptive mesh refinement (AMR) with arbitrary element shapes.
A bleeding-edge, lock-free, wait-free, continuation-stealing tasking library built on C++20's coroutines