63 results found Sort:

4.3k
37.7k
apache-2.0
347
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Created 2020-01-23
2,741 commits to master branch, last one a day ago
Run Mixtral-8x7B models in Colab or consumer desktops
Created 2023-12-15
86 commits to master branch, last one about a year ago
Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.
Created 2020-02-27
591 commits to master branch, last one 16 days ago
134
2.1k
apache-2.0
23
Mixture-of-Experts for Large Vision-Language Models
Created 2023-12-14
228 commits to main branch, last one 4 months ago
PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538
Created 2019-07-19
30 commits to master branch, last one 11 months ago
86
1.0k
apache-2.0
20
Codebase for Aria - an Open Multimodal Native MoE
Created 2024-09-29
207 commits to main branch, last one 2 months ago
55
945
apache-2.0
8
⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)
Created 2023-07-24
212 commits to main branch, last one 9 months ago
95
791
mit
16
Tutel MoE: An Optimized Mixture-of-Experts Implementation
Created 2021-08-06
202 commits to main branch, last one 8 hours ago
215
747
bsd-3-clause
27
Surrogate Modeling Toolbox
Created 2016-11-08
1,585 commits to master branch, last one 3 days ago
A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models
Created 2020-07-13
33 commits to master branch, last one about a year ago
A TensorFlow Keras implementation of "Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts" (KDD 2018)
Created 2018-09-10
22 commits to master branch, last one 3 years ago
From scratch implementation of a sparse mixture of experts language model inspired by Andrej Karpathy's makemore :)
Created 2024-01-22
116 commits to main branch, last one 5 months ago
44
602
apache-2.0
15
中文Mixtral混合专家大模型(Chinese Mixtral MoE LLMs)
Created 2024-01-11
31 commits to main branch, last one 11 months ago
30
460
lgpl-3.0
5
A library for easily merging multiple LLM experts, and efficiently train the merged LLM.
Created 2024-04-08
43 commits to main branch, last one 7 months ago
Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch
Created 2023-03-26
68 commits to main branch, last one 9 months ago
Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch
Created 2023-08-04
29 commits to main branch, last one 23 hours ago
GMoE could be the next backbone model for many kinds of generalization task.
Created 2022-05-28
28 commits to main branch, last one 2 years ago
Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
Created 2023-12-26
116 commits to main branch, last one about a year ago
9
232
apache-2.0
3
MoH: Multi-Head Attention as Mixture-of-Head Attention
Created 2024-10-08
19 commits to main branch, last one 5 months ago
18
203
apache-2.0
8
[ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration
Created 2024-02-05
49 commits to main branch, last one 11 months ago
6
198
apache-2.0
2
[ICLR 2025] MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts
Created 2024-10-08
11 commits to main branch, last one 5 months ago
12
160
apache-2.0
4
PyTorch library for cost-effective, fast and easy serving of MoE models.
Created 2024-01-22
31 commits to main branch, last one 10 days ago
[SIGIR'24] The official implementation code of MOELoRA.
Created 2023-10-19
21 commits to master branch, last one 8 months ago
A curated reading list of research in Adaptive Computation, Inference-Time Computation & Mixture of Experts (MoE).
Created 2023-08-17
139 commits to main branch, last one 3 months ago
[ICML 2024] See More Details: Efficient Image Super-Resolution by Experts Mining
Created 2024-02-05
31 commits to main branch, last one 8 months ago
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
Created 2024-07-09
26 commits to main branch, last one 7 months ago
7
122
apache-2.0
7
[ICLR 2025] LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation
Created 2024-08-26
17 commits to main branch, last one 3 days ago
Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts
Created 2023-04-21
42 commits to main branch, last one 5 months ago
[NeurIPS 2024] RealCompo: Balancing Realism and Compositionality Improves Text-to-Image Diffusion Models
Created 2024-02-20
17 commits to main branch, last one 4 months ago