Mixture of Experts (MoE)

MoE Training

  • Janus: A Unified Distributed Training Framework for Sparse Mixture-of-Experts Models (SIGCOMM 2023) [Paper]

    • THU & ByteDance

  • Accelerating Distributed MoE Training and Inference with Lina (ATC 2023) [Paper]

    • CityU & ByteDance & CUHK

  • SmartMoE: Efficiently Training Sparsely-Activated Models through Combining Offline and Online Parallelization (ATC 2023) [Paper] [Code]

    • THU

MoE Inference

  • Accelerating Distributed MoE Training and Inference with Lina (ATC 2023) [Paper]

    • CityU & ByteDance & CUHK

  • Optimizing Dynamic Neural Networks with Brainstorm (OSDI 2023) [Paper]

    • SJTU & MSRA & USTC

Models

Last updated