65 results found Sort:

6.1k
38.4k
mit
1.2k
Learn how to design, develop, deploy and iterate on production-grade ML applications.
Created 2018-11-05
18 commits to main branch, last one about a year ago
4.9k
33.9k
apache-2.0
316
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT)...
Created 2019-02-02
2,711 commits to main branch, last one 4 days ago
5.7k
22.7k
apache-2.0
715
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
Created 2016-08-15
53,687 commits to develop branch, last one 10 hours ago
3.0k
12.5k
apache-2.0
101
Easy-to-use and powerful LLM and SLM library with awesome model zoo.
Created 2021-02-05
5,815 commits to develop branch, last one a day ago
621
7.7k
apache-2.0
70
SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 16+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
Created 2021-08-11
2,633 commits to master branch, last one 14 hours ago
380
4.1k
apache-2.0
58
Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系,成为中文AIGC和认知智能的基础设施。
Created 2021-10-28
711 commits to main branch, last one about a year ago
745
3.8k
apache-2.0
94
FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on a...
Created 2020-07-21
12,120 commits to master branch, last one 11 months ago
491
3.7k
other
82
A high performance and generic framework for distributed DNN training
Created 2019-06-25
432 commits to master branch, last one 3 years ago
529
3.5k
apache-2.0
171
Fast and flexible AutoML with learning guarantees.
Created 2018-06-28
440 commits to master branch, last one 3 years ago
364
3.1k
apache-2.0
82
Determined is an open-source machine learning platform that simplifies distributed training, hyperparameter tuning, experiment tracking, and resource management. Works with PyTorch and TensorFlow.
Created 2020-04-07
8,394 commits to main branch, last one about a month ago
360
3.1k
apache-2.0
46
Training and serving large-scale neural networks with auto parallelization.
This repository has been archived (exclude archived)
Created 2021-02-22
668 commits to main branch, last one about a year ago
Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.
Created 2020-02-27
594 commits to master branch, last one 7 days ago
DLRover: An Automatic Distributed Deep Learning System
Created 2022-06-24
2,976 commits to master branch, last one 3 days ago
324
1.3k
other
59
Collective communications library with various primitives for multi-machine training.
Created 2017-02-03
502 commits to main branch, last one 22 hours ago
274
1.3k
unknown
57
Library for Fast and Flexible Human Pose Estimation
Created 2018-08-25
538 commits to master branch, last one 3 years ago
361
1.1k
apache-2.0
35
DeepRec is a high-performance recommendation deep learning framework based on TensorFlow. It is hosted in incubation in LF AI & Data Foundation.
Created 2021-12-24
65,623 commits to main branch, last one 3 months ago
Efficient Deep Learning Systems course materials (HSE, YSDA)
Created 2021-12-06
193 commits to main branch, last one 2 days ago
38
452
apache-2.0
9
Official code for ReLoRA from the paper Stack More Layers Differently: High-Rank Training Through Low-Rank Updates
Created 2023-04-27
217 commits to main branch, last one about a year ago
79
438
apache-2.0
10
Resource-adaptive cluster scheduler for deep learning training.
Created 2020-08-23
123 commits to master branch, last one 2 years ago
Best practices & guides on how to write distributed pytorch training code
Created 2024-07-31
271 commits to main branch, last one 2 months ago
56
402
apache-2.0
41
LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
Created 2021-10-25
358 commits to main branch, last one 6 months ago
129
360
other
20
TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and support for E2E production ML pipelines when you're ready.
Created 2021-05-04
802 commits to main branch, last one a day ago
22
331
apache-2.0
6
🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.
Created 2024-10-15
36 commits to main branch, last one 8 days ago
59
294
apache-2.0
22
Fast and Adaptive Distributed Machine Learning for TensorFlow, PyTorch and MindSpore.
Created 2018-12-29
384 commits to main branch, last one about a year ago
43
289
other
12
HandyRL is a handy and simple framework based on Python and PyTorch for distributed reinforcement learning that is applicable to your own environments.
Created 2020-06-03
813 commits to master branch, last one 2 months ago
10
286
mit
8
A Jax-based library for designing and training small transformers.
Created 2023-08-22
158 commits to main branch, last one 8 months ago
Collection of best practices, reference architectures, model training examples and utilities to train large models on AWS.
Created 2023-09-30
1,184 commits to main branch, last one 2 days ago
49
267
apache-2.0
12
Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.
Created 2022-02-23
21 commits to main branch, last one 2 years ago
🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash attention v2.
Created 2024-02-05
362 commits to main branch, last one 2 months ago