SOSP 2023
Last updated
Was this helpful?
Last updated
Was this helpful?
Homepage:
Efficient Memory Management for Large Language Model Serving with PagedAttention [] [] [] []
UC Berkeley & Stanford & UCSD
vLLM, PagedAttention
Oobleck: Resilient Distributed Training of Large Models Using Pipeline Templates [] [] []
UMich SymbioticLab & AWS & PKU
Gemini: Fast Failure Recovery in Distributed Training with In-Memory Checkpoints []
Rice & AWS
UGache: A Unified GPU Cache for Embedding-based Deep Learning [] []
SJTU
Multi-GPU embedding cache; exploit cross-GPU interconnects (NVLink, NVSwitch).
Bagpipe: Accelerating Deep Recommendation Model Training []
UW-Madison & UChicago