Aryl: An elastic cluster scheduler for deep learning
Metadata
Presented in arxiv:2202.07896.
Authors: Jiamin Li, Hong Xu, Yibo Zhu, Zherui Liu, Chuanxiong Guo, Cong Wang, ByteDance, City University of Hong Kong, The Chinese University of Hong Kong
Understanding the paper
TL;DRs
This paper presents Aryl, a cluster scheduler that introduces capacity loaning to loan idle inference GPU servers for training jobs.
It exploits elastic scaling that scales a training job’s GPU allocation to better utilize loaned resources.
Last updated