Deep Learning Training
Last updated
Was this helpful?
Last updated
Was this helpful?
EasyScale: Elastic Training with Consistent Accuracy and Improved Utilization on GPUs () [] []
BUAA & Alibaba
Hanayo: Harnessing Wave-like Pipeline Parallelism for Enhanced Large Model Training Efficiency () [] []
NUS
Supporting Very Large Models using Automatic Dataflow Graph Partitioning () []
NYU
Tofu: Automatic partition a dataflow graph of fine-grained tensor operations.
One weird trick for parallelizing convolutional neural networks (arXiv 1404.599) []
Data parallelism for convolutional layers; model parallelism for fully-connected layers.
A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters () [] [] []
THU & ByteDance
BytePS: Communication framework
Leverage spare CPU and bandwidth resources
Consider network topology
UNIST & Ajou & Alibaba & KAIST
Reduce the overall GPU consumption for co-located DNN training jobs
Utilize NVIDIA MPS
UMich SymbioticLab
Fine-grained GPU sharing; customized TensorFlow.
MSRA
Time slicing; suspend and resume; mini-batch granularity.
NYU
Tensor swapping
Consider both GPU memory allocation and operator scheduling
HUST & MSRA & USC
Combination of tensor swapping and recomputation.
UC Berkeley
Define tensor recomputation as an optimization problem.
Brown & UESTC & Los Alamos National Laboratory & Pacific Northwest National Laboratory & MIT
Cost-aware recomputation
Remove the convolutional layer tensor with low computational overhead
NVIDIA
Predictively swap tensors to overlap the CPU-GPU communication time.
UW & Dato Inc. & MIT
Memory Monger
Sublinear memory cost; trade computation for memory.
UofT
LSTM RNN training
MSR & UMich & UofT
Data encoding
Zico: Efficient GPU Memory Sharing for Concurrent DNN Training () [] []
Salus: Fine-Grained GPU Sharing Primitives for Deep Learning Applications () [] []
Gandiva: Introspective Cluster Scheduling for Deep Learning () []
SwapAdvisor: Pushing Deep Learning Beyond the GPU Memory Limit via Smart Swapping () []
Capuchin: Tensor-based GPU Memory Management for Deep Learning () []
Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization () [] []
SuperNeurons: Dynamic GPU Memory Management for Training Deep Neural Networks (PPoPP 2018) []
vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design (MICRO 2016) []
Training Deep Nets with Sublinear Memory Cost (arXiv 1604.06174) [] [] []
Echo: Compiler-based GPU Memory Footprint Reduction for LSTM RNN Training (ISCA 2020) []
Gist: Efficient Data Encoding for Deep Neural Network Training (ISCA 2018) []