ASPLOS 2024
Last updated
Was this helpful?
Last updated
Was this helpful?
Homepage:
SpotServe: Serving Generative Large Language Models on Preemptible Instances [] [] []
CMU & PKU & CUHK
Distributed LLM serving system on preemptible/spot instances
Techniques
Dynamically adapt the LLM parallelization configuration
Minimize the cost of migrating instances for dynamic re-parallelization
Stateful inference recovery
Proteus: A High-Throughput Inference-Serving System with Accuracy Scaling []
UMass-Amherst & Nokia Bell Labs
Heet: Accelerating Elastic Training in Heterogeneous Deep Learning Clusters
UMacau