Interference-aware multiplexing for deep learning in GPU clusters: A middleware approach
Last updated
Was this helpful?
Last updated
Was this helpful?
Presented in .
Tune training configurations (e.g., batch size) across all co-located tasks
Choose appropriate tasks to multiplex on a GPU device
Trade-off between mitigating interference and accelerating training progress to achieve optimal training time
Vast search space of task configurations
Coupling between adjusting task configurations and designing task placement policies