# NSDI 2024

## Meta Info

Homepage: <https://www.usenix.org/conference/nsdi24>

Paper list: <https://www.usenix.org/conference/nsdi24/technical-sessions>

## Papers

### Resource Management

* Resiliency at Scale: Managing Google’s TPUv4 Machine Learning Supercomputer \[[Paper](https://www.usenix.org/conference/nsdi24/presentation/zu)]
  * Google
  * Experience in designing and operating the software infrastructure that allows TPUv4 supercomputers to operate at scale.
* Autothrottle: A Practical Bi-Level Approach to Resource Management for SLO-Targeted Microservices \[[Paper](https://www.usenix.org/conference/nsdi24/presentation/wang-zibo)] \[[Slides](https://www.usenix.org/system/files/nsdi24_slides-wang_zibo.pdf)] \[[Code](https://github.com/microsoft/autothrottle)]
  * USTC & ETH & MSR
  * Minimize CPU allocation of *microservice applications* while meeting SLO.
  * Service-level (low overhead & fast reaction) vs. Application-level (global visibility)
    * Captains (service-level): control based on throttle ratio target; collect data every 100ms, adjust allocation every 1s.
    * Tower (application-level): determine the best throttle targets for Captains to achieve; online learning (contextual bandit algorithm); one step per minute, each step runs in \~100ms.
* CASSINI: Network-Aware Job Scheduling in Machine Learning Clusters \[[Paper](https://www.usenix.org/conference/nsdi24/presentation/rajasekaran)]
  * MIT & UT-Austin
  * Consider the communication pattern of different jobs while placing them on network links.

### Large Language Models (LLMs)

* LLM characterization
  * Characterization of Large Language Model Development in the Datacenter \[[Paper](https://www.usenix.org/conference/nsdi24/presentation/hu)] \[[Slides](https://www.usenix.org/system/files/nsdi24_slides-hu.pdf)] \[[Trace](https://github.com/InternLM/AcmeTrace)]
    * NTU & PKU & CUHK & Shanghai AI Lab
* LLM training
  * MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs \[[Paper](https://www.usenix.org/conference/nsdi24/presentation/jiang-ziheng)] \[[Slides](https://www.usenix.org/system/files/nsdi24_slides-jiang_ziheng.pdf)] \[[Code](https://github.com/volcengine/veScale)]
    * ByteDance & PKU

### Utilize Spot Instances

* Can't Be Late: Optimizing Spot Instance Savings under Deadlines \[[Paper](https://www.usenix.org/conference/nsdi24/presentation/wu-zhanghao)] \[[Trace](https://github.com/skypilot-org/spot-traces)]
  * UC Berkeley
  * **Outstanding Paper**
  * Characterization (e.g., availability, pricing, duration) of three-month-long spot availability traces on AWS.
  * **Uniform Progress**: a policy to make uniform progress towards the deadline, by distributing the job computation uniformly across the time.
* Parcae: Proactive, Liveput-Optimized DNN Training on Preemptible Instances \[[Paper](https://www.usenix.org/conference/nsdi24/presentation/duan)] \[[Slides](https://www.usenix.org/system/files/nsdi24_slides-duan.pdf)] \[[Code](https://github.com/JF-D/Parcae)]
  * CUHK & ByteDance & CMU & UCLA & Microsoft
  * Proactively adjust the parallelization strategy of a DNN training job for future preemptions to maximize preemption-aware throughput (i.e., liveput).

### Multimodal Models

* DISTMM: Accelerating Distributed Multimodal Model Training \[[Paper](https://www.usenix.org/conference/nsdi24/presentation/huang)]
  * Ohio State University & AWS
  * Partition and parallelize the submodules of a multimodal model based on their modalities and redistribute the training data.

### Diffusion Models

* Approximate Caching for Efficiently Serving Text-to-Image Diffusion Models \[[Paper](https://www.usenix.org/conference/nsdi24/presentation/agarwal-shubham)] \[[Slides](https://www.usenix.org/system/files/nsdi24_slides-agarwal_shubham.pdf)]
  * Adobe Research & UIUC
  * Approximate caching: reduce a certain number of denoising steps by reusing intermediate noise states created during a prior image generation.

### Deep Learning Recommendation Models (DLRMs)

* Accelerating Neural Recommendation Training with Embedding Scheduling \[[Paper](https://www.usenix.org/conference/nsdi24/presentation/zeng)] \[[Slides](https://www.usenix.org/system/files/nsdi24_slides-zeng.pdf)] \[[Code](https://github.com/HKUST-SING/herald)]
  * HKUST
  * **Herald**: an adaptive location-aware inputs allocator to determine *where embeddings should be trained* and an optimal communication plan generator to determine *which embeddings should be synchronized*.

### Fair Resource Allocation

* Solving Max-Min Fair Resource Allocations Quickly on Large Graphs \[[Paper](https://www.usenix.org/conference/nsdi24/presentation/namyar-solving)] \[[Slides](https://www.usenix.org/system/files/nsdi24_slides_namyar-solving.pdf)] \[[Code](https://github.com/microsoft/Soroush)]
  * Microsoft & USC & Rice
  * **Soroush**: Single-Shot Max-Min Fair Allocator.
  * Deployed on Microsoft WAN.

### Network Emulation

* Crescent: Emulating Heterogeneous Production Network at Scale \[[Paper](https://www.usenix.org/conference/nsdi24/presentation/gao-zhaoyu)] \[[Slides](https://www.usenix.org/system/files/nsdi24_slides-gao_zhaoyu.pdf)]
  * ByteDance & Cornell
  * **Crescent**: ByteDance’s *network emulation* platform for preventing *change-induced network incidents*.

### RDMA

* Harmonic: Hardware-assisted RDMA Performance Isolation for Public Clouds \[[Paper](https://www.usenix.org/conference/nsdi24/presentation/lou)]
  * UIUC & Duke & Microsoft
  * **Harmonic**: microarchitecture-resource-aware RDMA performance isolation; including a programmable intelligent PCIe switch (prototyped with FPGA) and an RDMA-friendly rate limiter.

### PCIe

* Understanding Routable PCIe Performance for Composable Infrastructures \[[Paper](https://www.usenix.org/conference/nsdi24/presentation/hou)]
  * UW-Madison & ZJU
  * **rPCIeBench**: a software-hardware co-designed benchmarking framework to systematically characterize the *routable PCIe fabric*.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://paper.lingyunyang.com/reading-notes/conference/nsdi-2024.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
