SoCC 2024
Last updated
Was this helpful?
Last updated
Was this helpful?
Homepage:
Paper list:
30.1% (= 63 / 209)
LLM inference
Queue Management for SLO-Oriented Large Language Model Serving []
UIUC & IBM Research
LLM training
Distributed Training of Large Language Models on AWS Trainium []
AWS
MoE inference
IBM Research
Stony Brook University
SJTU IPADS & Shanghai Artificial Intelligence Research Institute
gCROP: GPU Checkpoint/Restore made On-demand and Parallel
Scheduler for deep learning training workloads
Anhui University & Institute of Artificial Intelligence, Hefei Comprehensive National Science Center
Generative Adversarial Networks (GANs)
NUS
MoEsaic: Shared Mixture of Experts []
KACE: Kernel-Aware Colocation for Efficient GPU Spatial Sharing []
On-demand and Parallel Checkpoint/Restore for GPU Applications []
Hops: Fine-grained heterogeneous sensing, efficient and fair Deep Learning cluster scheduling system []
ParaGAN: A Scalable Distributed Training Framework for Generative Adversarial Networks []