Resource Fragmentation
GPU Fragmentation
HiveD: Sharing a GPU Cluster for Deep Learning with Guarantees (OSDI 2020) [Personal Notes] [Paper] [Code]
PKU & HKU & MSRA
Consider GPU affinity; resource reservation.
General Fragmentation
Large-scale cluster management at Google with Borg (EuroSys 2015) [Paper]
Google
Reduce stranded resources that cannot be used because another resource on the machine is fully allocated.
Multi-Resource Packing for Cluster Schedulers (SIGCOMM 2014) [Paper]
Microsoft
Tetris: Pack jobs to avoid resource fragmentation and over-allocation.
Evaluating job packing in warehouse-scale computing (CLUSTER 2014) [Personal Notes] [Paper]
Google
Four metrics for evaluating the packing efficiency of schedulers: aggregate utilization, hole filling, workload inflation, and cluster compaction.
Last updated