NSDI 2023
Last updated
Was this helpful?
Last updated
Was this helpful?
Homepage:
Paper list:
Spring:
Fall:
Bamboo: Making Preemptible Instances Resilient for Affordable Training of Large DNNs [] []
UCLA & CMU & MSR & Princeton
Resilient distributed training
Shepherd: Serving DNNs in the wild [] []
UWaterloo & Yale & UC Berkeley
Handle the short-term workload unpredictability.
Aggregate request streams into moderately-sized groups; leverage preemption and model-specific batching.
Duke & Microsoft & SJTU
Develop a test suite to evaluate RDMA performance isolation solutions.
Understanding RDMA Microarchitecture Resources for Performance Isolation [] [] []