Resource Fragmentation

GPU Fragmentation

  • Beware of Fragmentation: Scheduling GPU-Sharing Workloads with Fragmentation Gradient Descent (ATC 2023) [Paper] [Code]

    • HKUST & Alibaba

    • Quantify GPU fragmentation in GPU-sharing clusters.

    • Guided scheduling with fragmentation gradient descent.

  • HiveD: Sharing a GPU Cluster for Deep Learning with Guarantees (OSDI 2020) [Personal Notes] [Paper] [Code]

    • PKU & HKU & MSRA

    • Consider GPU affinity; resource reservation.

General Fragmentation

  • Large-scale cluster management at Google with Borg (EuroSys 2015) [Paper]

    • Google

    • Reduce stranded resources that cannot be used because another resource on the machine is fully allocated.

  • Multi-Resource Packing for Cluster Schedulers (SIGCOMM 2014) [Paper]

    • Microsoft

    • Tetris: Pack jobs to avoid resource fragmentation and over-allocation.

  • Evaluating job packing in warehouse-scale computing (CLUSTER 2014) [Personal Notes] [Paper]

    • Google

    • Four metrics for evaluating the packing efficiency of schedulers: aggregate utilization, hole filling, workload inflation, and cluster compaction.

Last updated