githubEdit

GPU Sharing

circle-info

I am actively maintaining this list.

GPU Temporal Sharing

GPU Spatial Sharing

  • Paella: Low-latency Model Serving with Software-defined GPU Scheduling (SOSP 2023) [Paperarrow-up-right] [Codearrow-up-right]

    • UPenn & DBOS, Inc.

    • Instrument kernels to expose at runtime, detailed information about the occupancy and utilization of the GPU's SMs

    • Compiler-library-scheduler co-design

  • MuxFlow: Efficient and Safe GPU Sharing in Large-Scale Production Deep Learning Clusters (arXiv 2303.13803) [Paperarrow-up-right]

    • PKU & ByteDance

    • Utilize NVIDIA MPS

  • NVIDIA Multi-Instance GPU (MIG) [Homepagearrow-up-right]

    • Partition the GPU into as many as seven instances, each fully isolated with its own high-bandwidth memory, cache, and compute cores.

    • Available for NVIDIA H100, A100, and A30 GPUs.

  • NVIDIA Multi-Process Service (MPS) [Docsarrow-up-right]

    • Transparently enable co-operative multi-process CUDA applications.

    • Terminating an MPS client without synchronizing with all outstanding GPU work (via Ctrl-C / program exception such as segfault / signals, etc.) can leave the MPS server and other MPS clients in an undefined state, which may result in hangs, unexpected failures, or corruptions.

  • NVIDIA CUDA Multi-Stream [Docsarrow-up-right]

    • Stream: a sequence of operations that execute in issue-order on the GPU.

    • Perform multiple CUDA operations simultaneously.

Survey

Last updated