SoCC 2024

Meta Info

Homepage: https://acmsocc.org/2024/index.html

Paper list: https://acmsocc.org/2024/schedule.html

Papers

Large Language Models (LLMs)

  • LLM inference

    • Queue Management for SLO-Oriented Large Language Model Serving [Paper]

      • UIUC & IBM Research

  • LLM training

    • Distributed Training of Large Language Models on AWS Trainium [Paper]

      • AWS

Mixture of Experts (MoEs)

  • MoE inference

    • MoEsaic: Shared Mixture of Experts [Paper]

      • IBM Research

GPU Sharing

  • KACE: Kernel-Aware Colocation for Efficient GPU Spatial Sharing [Paper]

    • Stony Brook University

Serverless Computing

  • On-demand and Parallel Checkpoint/Restore for GPU Applications [Paper]

    • SJTU IPADS & Shanghai Artificial Intelligence Research Institute

    • gCROP: GPU Checkpoint/Restore made On-demand and Parallel

Resource Scheduler

  • Scheduler for deep learning training workloads

    • Hops: Fine-grained heterogeneous sensing, efficient and fair Deep Learning cluster scheduling system [Paper]

      • Anhui University & Institute of Artificial Intelligence, Hefei Comprehensive National Science Center

Distributed Training

  • Generative Adversarial Networks (GANs)

    • ParaGAN: A Scalable Distributed Training Framework for Generative Adversarial Networks [Paper]

      • NUS

Last updated