Mixture of Experts (MoE)
MoE Training
Janus: A Unified Distributed Training Framework for Sparse Mixture-of-Experts Models (SIGCOMM 2023) [Paper]
THU & ByteDance
MoE Inference
Models
Mixtral-8x7B [Hugging Face] [Blog]
Mistral AI
Last updated