Search Results - RepositoryStats

cupy cupy

892

10.0k

mit

127

NumPy & SciPy for GPU

gpu cuda cupy nccl nvtx rocm cudnn numpy nvrtc scipy cublas curand python tensor cusolver cusparse cutensor cusparselt

Created 2016-11-01

29,659 commits to main branch, last one 5 days ago

cudarc coreylowman

93

760

apache-2.0

10

Safe rust wrapper around CUDA toolkit

gpu cuda nccl rust cudnn nvrtc cublas curand cuda-kernels cuda-toolkit cuda-programming gpu-acceleration

Created 2022-09-16

346 commits to main branch, last one 3 days ago

llm_training_handbook huggingface

40

479

cc-by-sa-4.0

51

An open collection of methodologies to help with successful training of large language models.

llm nlp cuda nccl python pytorch performance scalability troubleshooting large-language-models

Created 2023-03-08

18 commits to main branch, last one about a year ago

large_language_model_training_playbook huggingface

23

471

apache-2.0

66

An open collection of implementation tips, tricks and resources for training large language models

llm nlp cuda nccl python pytorch performance scalability troubleshooting large-language-models

Created 2023-03-06

26 commits to main branch, last one 2 years ago

distributed-training-guide LambdaLabsML

27

373

mit

6

Best practices & guides on how to write distributed pytorch training code

gpu mpi cuda fsdp nccl slurm cluster pytorch sharding deepspeed kuberentes lambdalabs gpu-cluster distributed-training

Created 2024-07-31

271 commits to main branch, last one 27 days ago

bluefog Bluefog-Lib

51

256

apache-2.0

21

Distributed and decentralized training framework for PyTorch over graph

mpi nccl pytorch one-sided asynchronous deeplearning decentralized machine-learning distributed-computing

Created 2019-12-03

1,094 commits to master branch, last one about a year ago

tutorial-multi-gpu FZJ-JSC

55

247

mit

10

Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial

gpu hpc mpi cuda nccl sc21 sc22 sc23 isc22 isc23 isc24 nvshmem multi-gpu supercomputing exascale-computing

Created 2021-09-23

284 commits to main branch, last one 3 days ago

msrflute microsoft

23

188

mit

9

Federated Learning Utilities and Tools for Experimentation

gloo nccl pytorch simulation privacy-tools personalization machine-learning federated-learning transformers-models distributed-learning

Created 2021-11-17

54 commits to main branch, last one about a year ago

nccl-fastsocket google

13

116

other

4

NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.

nccl training machine-learning

This repository has been archived (exclude archived)

Created 2021-07-19

20 commits to master branch, last one about a year ago