githubEdit

Deep Learning Training

Elastic Training

Parallelism

  • Hanayo: Harnessing Wave-like Pipeline Parallelism for Enhanced Large Model Training Efficiency (SC 2023) [Paperarrow-up-right] [Codearrow-up-right]

    • NUS

  • Supporting Very Large Models using Automatic Dataflow Graph Partitioning (EuroSys 2019) [Paperarrow-up-right]

    • NYU

    • Tofu: Automatic partition a dataflow graph of fine-grained tensor operations.

  • One weird trick for parallelizing convolutional neural networks (arXiv 1404.599) [Paperarrow-up-right]

    • Google

    • Data parallelism for convolutional layers; model parallelism for fully-connected layers.

Optimizing Network Communication

Reduce GPU Memory Footprints

GPU Sharing

Tensor Swapping / Recomputation

  • SwapAdvisor: Pushing Deep Learning Beyond the GPU Memory Limit via Smart Swapping (ASPLOS 2020) [Paperarrow-up-right]

    • NYU

    • Tensor swapping

    • Consider both GPU memory allocation and operator scheduling

  • Capuchin: Tensor-based GPU Memory Management for Deep Learning (ASPLOS 2020) [Paperarrow-up-right]

    • HUST & MSRA & USC

    • Combination of tensor swapping and recomputation.

  • Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization (MLSys 2020) [Paperarrow-up-right] [Codearrow-up-right]

    • UC Berkeley

    • Define tensor recomputation as an optimization problem.

  • SuperNeurons: Dynamic GPU Memory Management for Training Deep Neural Networks (PPoPP 2018) [Paperarrow-up-right]

    • Brown & UESTC & Los Alamos National Laboratory & Pacific Northwest National Laboratory & MIT

    • Cost-aware recomputation

    • Remove the convolutional layer tensor with low computational overhead

  • vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design (MICRO 2016) [Paperarrow-up-right]

    • NVIDIA

    • Predictively swap tensors to overlap the CPU-GPU communication time.

  • Training Deep Nets with Sublinear Memory Cost (arXiv 1604.06174) [Personal Notesarrow-up-right] [Paperarrow-up-right] [Codearrow-up-right]

    • UW & Dato Inc. & MIT

    • Memory Monger

    • Sublinear memory cost; trade computation for memory.

Compression

  • Echo: Compiler-based GPU Memory Footprint Reduction for LSTM RNN Training (ISCA 2020) [Paperarrow-up-right]

    • UofT

    • LSTM RNN training

  • Gist: Efficient Data Encoding for Deep Neural Network Training (ISCA 2018) [Paperarrow-up-right]

    • MSR & UMich & UofT

    • Data encoding

Last updated