Aryl: An elastic cluster scheduler for deep learning


Presented in arxiv:2202.07896.

Authors: Jiamin Li, Hong Xu, Yibo Zhu, Zherui Liu, Chuanxiong Guo, Cong Wang, ByteDance, City University of Hong Kong, The Chinese University of Hong Kong

Understanding the paper


This paper presents Aryl, a cluster scheduler that introduces capacity loaning to loan idle inference GPU servers for training jobs.

It exploits elastic scaling that scales a training job’s GPU allocation to better utilize loaned resources.

Last updated