Resource Fragmentation

GPU Fragmentation

Beware of Fragmentation: Scheduling GPU-Sharing Workloads with Fragmentation Gradient Descent (ATC 2023) [Paper] [Code]
- HKUST & Alibaba
- Quantify GPU fragmentation in GPU-sharing clusters.
- Guided scheduling with fragmentation gradient descent.
HiveD: Sharing a GPU Cluster for Deep Learning with Guarantees (OSDI 2020) [Personal Notes] [Paper] [Code]
- PKU & HKU & MSRA
- Consider GPU affinity; resource reservation.

Large-scale cluster management at Google with Borg (EuroSys 2015) [Paper]
- Google
- Reduce stranded resources that cannot be used because another resource on the machine is fully allocated.
Multi-Resource Packing for Cluster Schedulers (SIGCOMM 2014) [Paper]
- Microsoft
- Tetris: Pack jobs to avoid resource fragmentation and over-allocation.
Evaluating job packing in warehouse-scale computing (CLUSTER 2014) [Personal Notes] [Paper]
- Google
- Four metrics for evaluating the packing efficiency of schedulers: aggregate utilization, hole filling, workload inflation, and cluster compaction.

Last updated 1 year ago

Was this helpful?