# Lucid: A non-intrusive, scalable and interpretable scheduler for deep learning training jobs

## Meta Info

Presented in [ASPLOS 2023](https://doi.org/10.1145/3575693.3575705).

Authors: Qinghao Hu (*NTU & Shanghai AI Lab*), Meng Zhang (*NTU*), Peng Sun (*SenseTime*), Yonggang Wen, Tianwei Zhang (*NTU*).

Code: <https://github.com/S-Lab-System-Group/Lucid>

## Understanding the paper

### TL;DRs

This paper presents **Lucid**, a non-intrusive *DL scheduler* based on *interpretable models*.

It introduces *a two-dimensional optimized profiler* for efficient job metric collection and timely debugging job feedback; utilizes *a packing strategy* to circumvent interference; allocates resources based on *estimated job priority values and sharing scores*.

### Interpretable Models

* Decision Tree (DT) for Packing Analyze Model
* Additive model algorithm GA$$^2$$M for Throughput Predict Model & Workload Estimate Model
