SOSP 2025

Meta Info

Homepage: https://sigops.org/s/conferences/sosp/2025/

Acceptance rate: 17.7% (= 65 / 368)

Papers

Large Language Models (LLMs)

  • LLM Training

    • Mycroft: Tracing Dependencies in Collective Communication Towards Reliable LLM Training

      • CUHK & ByteDance

  • LLM Inference

    • Effective Memory Management for Serving LLM with Heterogeneity [arXiv]

      • THU & Chicago & UC Berkeley

      • Two challenges

        • Recent models have heterogeneous embeddings with different sizes.

        • Some new architectures use only a subset of the prefix tokens to generate the next token.

      • Designs

        • Two-level memory allocator: choose the page size as least common multiple of token embedding sizes.

        • Enable attention variants to customize this mechanism by precisely specifying the exact prefix subset.

    • LMPrefill: An Inference Engine for Prefill-only Workloads in Large Language Model Applications [arXiv]

      • Chicago & THU & LinkedIn & UC Berkeley

      • Hybrid prefilling: Prefill non-attention layers chunk-by-chunk, but prefill the attention layers normally.

      • Suffix KV cache discarding / offloading: Discard the useless KV cache.

      • Continuous JCT calibration: Continuously reestimate the JCT of each request based on what requests are previously scheduled, and then schedules just one request with the lowest JCT.

    • Aegaeon: Effective GPU Pooling for Concurrent LLM Serving on the Market

      • PKU & Alibaba Cloud

  • Retrieval Augmented Generation (RAG)

    • METIS: Fast Quality-Aware RAG Systems with Configuration Adaptation [arXiv]

      • Chicago & MSR & Princeton

    • HeteRAG: Co-Optimizing Generation and Retrieval for Heterogeneous RAG Workflows [arXiv]

      • UCSD

  • Optimization

    • Mercury: Unlocking Multi-GPU Operator Optimization for LLMs via Remote Memory Scheduling

      • UCSD & Meta

GPU Checkpointing

  • PhoenixOS: Concurrent OS-level GPU Checkpoint and Restore with Validated Speculation [arXiv]

    • SJTU IPADS

Last updated

Was this helpful?