Aryl: An elastic cluster scheduler for deep learning

Metadata

Presented in arxiv:2202.07896.

Authors: Jiamin Li, Hong Xu, Yibo Zhu, Zherui Liu, Chuanxiong Guo, Cong Wang, ByteDance, City University of Hong Kong, The Chinese University of Hong Kong

Understanding the paper

TL;DRs

This paper presents Aryl, a cluster scheduler that introduces capacity loaning to loan idle inference GPU servers for training jobs.

It exploits elastic scaling that scales a training job’s GPU allocation to better utilize loaned resources.

Last updated