Aryl: An elastic cluster scheduler for deep learning
Last updated
Was this helpful?
Last updated
Was this helpful?
Presented in .
Authors: Jiamin Li, Hong Xu, Yibo Zhu, Zherui Liu, Chuanxiong Guo, Cong Wang, ByteDance, City University of Hong Kong, The Chinese University of Hong Kong
This paper presents Aryl, a cluster scheduler that introduces capacity loaning to loan idle inference GPU servers for training jobs.
It exploits elastic scaling that scales a training job’s GPU allocation to better utilize loaned resources.