23 results found Sort:

510
4.9k
apache-2.0
182
The Forge Cross-Platform Rendering Framework PC Windows, Steamdeck (native), Ray Tracing, macOS / iOS, Android, XBOX, PS4, PS5, Switch, Quest 2
Created 2017-10-03
516 commits to master branch, last one 2 months ago
370
1.6k
apache-2.0
93
Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
This repository has been archived (exclude archived)
Created 2017-09-08
1,695 commits to master branch, last one 4 years ago
Extract video features from raw videos using multiple GPUs. We support RAFT flow frames as well as S3D, I3D, R(2+1)D, VGGish, CLIP, and TIMM models.
Created 2020-06-01
98 commits to master branch, last one about a month ago
Multi-threaded GUI manager for mass creation of AI-generated art with support for multiple GPUs.
Created 2022-09-13
378 commits to main branch, last one 4 months ago
GPU-ready Dockerfile to run Stability.AI stable-diffusion model v2 with a simple web interface. Includes multi-GPUs support.
Created 2022-08-25
89 commits to master branch, last one 6 months ago
38
325
bsd-3-clause
9
Package for writing high-level code for parallel high-performance stencil computations that can be deployed on both GPUs and CPUs
Created 2020-12-22
1,147 commits to main branch, last one 10 days ago
101
296
other
58
QUDA is a library for performing calculations in lattice QCD on GPUs.
Created 2011-01-27
14,580 commits to develop branch, last one a day ago
A PyTorch implementation of the 'FaceNet' paper for training a facial recognition model with Triplet Loss using the glint360k dataset. A pre-trained model using Triplet Loss is available for download.
Created 2019-03-21
267 commits to master branch, last one 3 years ago
Distributed tensors and Machine Learning framework with GPU and MPI acceleration in Python
Created 2018-05-17
7,255 commits to main branch, last one 3 days ago
Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial
Created 2021-09-23
283 commits to main branch, last one about a month ago
Chains stable-diffusion-webui instances together to facilitate faster image generation.
Created 2023-03-06
247 commits to master branch, last one about a month ago
54
173
apache-2.0
9
multi-gpu pre-training in one machine for BERT from scratch without horovod (Data Parallelism)
Created 2018-12-25
99 commits to master branch, last one 2 months ago
66
169
unknown
36
The world's first CUDA implementation of Weakly-Compressible Smoothed Particle Hydrodynamics
Created 2014-05-28
5,350 commits to master branch, last one 3 years ago
20
165
bsd-3-clause
14
Almost trivial distributed parallelization of stencil-based GPU and CPU applications on a regular staggered grid
Created 2019-12-06
375 commits to master branch, last one about a month ago
High-level C++ for Accelerator Clusters
Created 2019-07-16
996 commits to master branch, last one 3 days ago
Efficient and Scalable Physics-Informed Deep Learning and Scientific Machine Learning on top of Tensorflow for multi-worker distributed computing
Created 2020-10-03
309 commits to main branch, last one 2 years ago
Multi-device OpenCL kernel load balancer and pipeliner API for C#. Uses shared-distributed memory model to keep GPUs updated fast while using same kernel on all devices(for simplicity).
Created 2017-03-29
403 commits to master branch, last one 2 years ago
A dual-GPU DEM solver with complex grain geometry support
Created 2021-04-22
655 commits to main branch, last one 4 months ago
23
45
apache-2.0
10
POT3D: High Performance Potential Field Solver
Created 2021-01-21
75 commits to main branch, last one 3 days ago
Artifact for OSDI'23: MGG: Accelerating Graph Neural Networks with Fine-grained intra-kernel Communication-Computation Pipelining on Multi-GPU Platforms.
Created 2023-03-26
284 commits to GCP branch, last one 9 months ago