Last updated 9 months ago
Was this helpful?
Homepage:
Janus: A Unified Distributed Training Framework for Sparse Mixture-of-Experts Models []
THU & ByteDance