Mixture of Experts (MoE)
Last updated
Was this helpful?
Last updated
Was this helpful?
Janus: A Unified Distributed Training Framework for Sparse Mixture-of-Experts Models () []
THU & ByteDance
Accelerating Distributed MoE Training and Inference with Lina () []
CityU & ByteDance & CUHK
SmartMoE: Efficiently Training Sparsely-Activated Models through Combining Offline and Online Parallelization () [] []
THU
Accelerating Distributed MoE Training and Inference with Lina () []
CityU & ByteDance & CUHK
Optimizing Dynamic Neural Networks with Brainstorm () []
SJTU & MSRA & USTC
Mixtral-8x7B [] []
Mistral AI