SOSP 2023
Meta Info
Homepage: https://sosp2023.mpi-sws.org/
Papers
Large Language Models (LLMs)
Gemini: Fast Failure Recovery in Distributed Training with In-Memory Checkpoints [Paper]
Rice & AWS
Deep Learning Recommendation Models (DLRMs)
UGache: A Unified GPU Cache for Embedding-based Deep Learning [Personal Notes] [Paper]
SJTU
Multi-GPU embedding cache; exploit cross-GPU interconnects (NVLink, NVSwitch).
Bagpipe: Accelerating Deep Recommendation Model Training [Paper]
UW-Madison & UChicago
Last updated