SOSP 2024
Meta Info
Homepage: https://sigops.org/s/conferences/sosp/2024/
Papers
Large Language Models (LLMs)
LLM Training
Enabling Parallelism Hot Switching for Efficient Training of Large Language Models
PKU
Perseus: Removing Energy Bloat from Large Model Training [arXiv]
UMich
Use a graph cut-based algorithm to obtain the "iteration time-energy" Pareto frontier; schedule the energy consumption across time.
LLM Inference
LoongServe: Efficiently Serving Long-context Large Language Models with Elastic Sequence Parallelism [arXiv]
PKU
ESP: Elastic Sequence Parallelism
Elastically adjust the degree of parallelism in real-time; reduce key-value cache migration overhead and overlap partial decoding communication with computation; reduce key-value cache fragmentation across instances.
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU [arXiv]
SJTU IPADS
ML Serving
Improving DNN Inference Throughput using Practical, Per-Input Compute Adaptation
GaTech & Princeton
Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving [arXiv]
Princeton & GaTech
Automatically apply and manage early exits (certain inputs can exit with results at intermediate layers) in ML models.
Distributed Training
SlipStream: Adapting Pipelines for Distributed Training of Large DNNs Amid Failures [arXiv]
Stanford
Dynamically re-route the work of a failed server to data-parallel peers; execute within bubbles of the original pipeline schedule.
Tenplex: Dynamic Parallelism for Deep Learning using Parallelizable Tensor Collections [arXiv]
ICL
Tenplex — a state management library.
Enable jobs to change the parallelism dynamically.
PTC: Parallelizable Tensor Collection
Dataset state
Modle state
Execute PTC transformations in parallel with minimum data movement between workers.
ML Compilation
Scaling Deep Learning Computation over the Inter-Core Connected Intelligence Processor [arXiv]
UIUC & MSRA
T10, the first DL compiler to exploit the inter-core communication bandwidth and distributed on-chip memory on AI chips (i.e., Graphcore IPU).
SilvanForge: A Schedule-Guided Retargetable Compiler for Decision Tree Inference
IISc
Serverless Computing
Dirigent: Lightweight Serverless Orchestration [arXiv]
ETH
Simplify state management of the existing orchestration system (Kubernetes); eliminate persistent state updates; run monolithic control and data planes to minimize internal communication overheads.
Last updated