Model Serving
Large language models (LLMs) are hot and diverse compared to conventional models. Therefore, I have classified the related works for LLMs in another paper list.
I am actively maintaining this list.
Model Serving Systems
Clipper: A Low-Latency Online Prediction Serving System (NSDI 2017) [Personal Notes] [Paper] [Code]
UC Berkeley
Caching, batching, adaptive model selection.
TensorFlow-Serving: Flexible, High-Performance ML Serving (NIPS 2017 Workshop on ML Systems) [Paper]
Google
Auto-Configuration for Model Serving
Serving Unseen Deep Learning Models with Near-Optimal Configurations: a Fast Adaptive Search Approach (SoCC 2022) [Personal Notes] [Paper] [Code]
ISCAS
Characterize a DL model by its key operators.
Survey
A Survey of Multi-Tenant Deep Learning Inference on GPU (MLSys 2022 Workshop on Cloud Intelligence / AIOps) [Paper]
George Mason & Microsoft & Maryland
A Survey of Large-Scale Deep Learning Serving System Optimization: Challenges and Opportunities (arXiv 2111.14247) [Paper]
George Mason & Microsoft & Pittsburgh & Maryland
Last updated