9 results found Sort:

861
9.6k
mit
127
NumPy & SciPy for GPU
Created 2016-11-01
29,404 commits to main branch, last one 21 hours ago
83
665
apache-2.0
13
Safe rust wrapper around CUDA toolkit
Created 2022-09-16
272 commits to main branch, last one 7 days ago
An open collection of methodologies to help with successful training of large language models.
Created 2023-03-08
18 commits to main branch, last one about a year ago
An open collection of implementation tips, tricks and resources for training large language models
Created 2023-03-06
26 commits to main branch, last one about a year ago
Best practices & guides on how to write distributed pytorch training code
Created 2024-07-31
238 commits to main branch, last one 8 days ago
71
295
apache-2.0
28
Distributed and decentralized training framework for PyTorch over graph
Created 2019-12-03
1,094 commits to master branch, last one about a year ago
Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial
Created 2021-09-23
283 commits to main branch, last one about a month ago
Federated Learning Utilities and Tools for Experimentation
Created 2021-11-17
54 commits to main branch, last one about a year ago
NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.
Created 2021-07-19
20 commits to master branch, last one about a year ago