OSDI 2025

Meta Info

Homepage: https://www.usenix.org/conference/osdi25

Acceptance Rate

14.6% (= 48 / 327)

Papers

Large Language Models (LLMs)

  • LLM Training

    • WLB-LLM: Workload-Balanced 4D Parallelism for Large Language Model Training

      • UCSD

  • LLM Inference

    • Fast and Live Model Auto Scaling with O(1) Host Caching

      • SJTU IPADS

    • WaferLLM: Large Language Model Inference at Wafer Scale [Paper] [Code]

      • Edinburgh & MSRA

      • Wafer-scale LLM parallelism

      • MeshGEMM: a scalable GEMM algorithm for wafer-scale devices to accelerate the prefill phase.

      • MeshGEMV: a scalable GEMV algorithm for wafer-scale devices to accelerate the decode phase.

Deep Learning Compilation

  • KPerfIR: Towards a Open and Compiler-centric Ecosystem for GPU Kernel Performance Tooling on Modern AI Workloads

    • UCSD

  • Mirage: A Multi-Level Superoptimizer for Tensor Programs [Paper] [arXiv] [Code]

    • CMU

    • µGraphs: a uniform representation of tensor programs at the kernel, thread block, and thread levels of the GPU compute hierarchy

GPU Sharing

  • Preemptive Scheduling for Diverse XPUs using Multi-level Hardware Model [Paper] [Code] [Slides]

    • SJTU IPADS

    • XQueue: An XPU task is abstracted as a sequence of commands executed on a command queue.

    • Multi-level hardware model

      • Level-1: Preempt pending commands (block host CPU from launching new commands, no hardware requirements)

      • Level-2: Preempt in-flight commands (e.g., instruct the μ-controllers to stall command dispatching, leverage command programmability)

      • Level-3: Preempt running commands

GPU Communication

  • Enabling Efficient GPU Communication over Multiple NICs with FuseLink [Paper]

    • HKUST iSING Lab

    • Integrate high-speed intra-server links as critical extensions of the inter-server network.

    • Implemented as an independent networking module to replace the default Infiniband networking in NCCL.

Resource Allocation

  • Decouple and Decompose: Scaling Resource Allocation through a Different Lens

    • Harvard

Memory Translation

  • EMT: An OS Framework for New Memory Translation Architectures

    • UIUC

  • Quake: Adaptive Indexing for Vector Search

    • UW-Madison

File Systems

  • Fast and Synchronous Crash Consistency with Metadata Write-Once File System

    • HIT-SZ

Databases

  • Tigon: A Distributed Database for a CXL Pod

    • UT-Austin

Replicated State Machines (RSMs)

  • Picsou: Enabling Efficient Cross-Consensus Communication [arXiv]

    • UC Berkeley

Last updated

Was this helpful?