23 results found Sort:
- Filter by Primary Language:
- Python (11)
- C++ (5)
- Julia (2)
- Cuda (2)
- Fortran (1)
- C# (1)
- Jupyter Notebook (1)
- +
The Forge Cross-Platform Rendering Framework PC Windows, Steamdeck (native), Ray Tracing, macOS / iOS, Android, XBOX, PS4, PS5, Switch, Quest 2
Created
2017-10-03
516 commits to master branch, last one about a month ago
Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
This repository has been archived
(exclude archived)
Created
2017-09-08
1,695 commits to master branch, last one 4 years ago
Extract video features from raw videos using multiple GPUs. We support RAFT flow frames as well as S3D, I3D, R(2+1)D, VGGish, CLIP, and TIMM models.
Created
2020-06-01
98 commits to master branch, last one 26 days ago
Multi-threaded GUI manager for mass creation of AI-generated art with support for multiple GPUs.
Created
2022-09-13
378 commits to main branch, last one 3 months ago
GPU-ready Dockerfile to run Stability.AI stable-diffusion model v2 with a simple web interface. Includes multi-GPUs support.
Created
2022-08-25
89 commits to master branch, last one 5 months ago
Package for writing high-level code for parallel high-performance stencil computations that can be deployed on both GPUs and CPUs
Created
2020-12-22
1,074 commits to main branch, last one 20 days ago
QUDA is a library for performing calculations in lattice QCD on GPUs.
Created
2011-01-27
14,443 commits to develop branch, last one 2 days ago
A PyTorch implementation of the 'FaceNet' paper for training a facial recognition model with Triplet Loss using the glint360k dataset. A pre-trained model using Triplet Loss is available for download.
Created
2019-03-21
267 commits to master branch, last one 3 years ago
Distributed tensors and Machine Learning framework with GPU and MPI acceleration in Python
Created
2018-05-17
7,160 commits to main branch, last one 8 days ago
Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial
Created
2021-09-23
283 commits to main branch, last one 3 days ago
Chains stable-diffusion-webui instances together to facilitate faster image generation.
Created
2023-03-06
247 commits to master branch, last one 25 days ago
multi-gpu pre-training in one machine for BERT from scratch without horovod (Data Parallelism)
Created
2018-12-25
99 commits to master branch, last one about a month ago
The world's first CUDA implementation of Weakly-Compressible Smoothed Particle Hydrodynamics
Created
2014-05-28
5,350 commits to master branch, last one 3 years ago
Almost trivial distributed parallelization of stencil-based GPU and CPU applications on a regular staggered grid
Created
2019-12-06
375 commits to master branch, last one 8 days ago
High-level C++ for Accelerator Clusters
Created
2019-07-16
966 commits to master branch, last one a day ago
Efficient and Scalable Physics-Informed Deep Learning and Scientific Machine Learning on top of Tensorflow for multi-worker distributed computing
Created
2020-10-03
309 commits to main branch, last one 2 years ago
Multi-device OpenCL kernel load balancer and pipeliner API for C#. Uses shared-distributed memory model to keep GPUs updated fast while using same kernel on all devices(for simplicity).
Created
2017-03-29
403 commits to master branch, last one 2 years ago
A dual-GPU DEM solver with complex grain geometry support
Created
2021-04-22
655 commits to main branch, last one 3 months ago
<케라스 창시자에게 배우는 딥러닝 2판> 도서의 코드 저장소
Created
2022-04-11
106 commits to main branch, last one 8 months ago
Neutron: A pytorch based implementation of Transformer and its variants.
python3
pytorch
seq2seq
ensemble
multi-gpu
optimizers
beam-search
transformer
average-models
multilingual-nmt
context-aware-nmt
relative-position
dynamic-batch-size
sentential-context
attention-is-all-you-need
average-attention-network
dynamic-sentence-sampling
neural-machine-translation
natural-language-processing
robust-neural-machine-translation
This repository has been archived
(exclude archived)
Created
2019-01-11
70 commits to master branch, last one about a year ago
:dart: Accumulated Gradients for TensorFlow 2
Created
2022-05-31
698 commits to main branch, last one 10 months ago
POT3D: High Performance Potential Field Solver
Created
2021-01-21
67 commits to main branch, last one 2 months ago
Artifact for OSDI'23: MGG: Accelerating Graph Neural Networks with Fine-grained intra-kernel Communication-Computation Pipelining on Multi-GPU Platforms.
Created
2023-03-26
284 commits to GCP branch, last one 8 months ago