SoCC 2024
Meta Info
Homepage: https://acmsocc.org/2024/index.html
Paper list: https://acmsocc.org/2024/schedule.html
Papers
Large Language Models (LLMs)
LLM inference
Queue Management for SLO-Oriented Large Language Model Serving [Paper]
UIUC & IBM Research
LLM training
Distributed Training of Large Language Models on AWS Trainium [Paper]
AWS
Mixture of Experts (MoEs)
MoE inference
MoEsaic: Shared Mixture of Experts [Paper]
IBM Research
GPU Sharing
KACE: Kernel-Aware Colocation for Efficient GPU Spatial Sharing [Paper]
Stony Brook University
Serverless Computing
On-demand and Parallel Checkpoint/Restore for GPU Applications [Paper]
SJTU IPADS & Shanghai Artificial Intelligence Research Institute
gCROP: GPU Checkpoint/Restore made On-demand and Parallel
Resource Scheduler
Scheduler for deep learning training workloads
Hops: Fine-grained heterogeneous sensing, efficient and fair Deep Learning cluster scheduling system [Paper]
Anhui University & Institute of Artificial Intelligence, Hefei Comprehensive National Science Center
Distributed Training
Generative Adversarial Networks (GANs)
ParaGAN: A Scalable Distributed Training Framework for Generative Adversarial Networks [Paper]
NUS
Last updated