# ATC 2024

## Meta Info

Homepage: <https://www.usenix.org/conference/atc24>

Paper list: <https://www.usenix.org/conference/atc24/technical-sessions>

### Acceptance Rate

15.8% (= 77 / 488)

## Papers

### Large Language Models (LLMs)

* Serving LLMs
  * Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttention \[[Paper](https://www.usenix.org/conference/atc24/presentation/gao-bin-cost)]
    * NUS & SJTU & Huawei Cloud
    * Reuse KV caches across multi-turn conversations; maintain a hierarchical KV caching system; layer-wise pre-loading and asynchronous saving; scheduler-aware fetching and eviction.
  * Quant-LLM: Accelerating the Serving of Large Language Models via FP6-Centric Algorithm-System Co-Design on Modern GPUs \[[Paper](https://www.usenix.org/conference/atc24/presentation/xia)] \[[Code](https://github.com/usyd-fsalab/fp6_llm)]
    * Sydney & Microsoft & Rutgers
    * **TC-FPx**, the first full-stack *GPU kernel design* scheme with unified Tensor Core support of 6-bit and arbitrary bit-width quantization (e.g., 5-bit).
* LLM alignment / RLHF training
  * PUZZLE: Efficiently Aligning Large Language Models through Light-Weight Context Switch \[[Paper](https://www.usenix.org/conference/atc24/presentation/lei)]
    * THU
    * Intra-stage switching: explore model affinities and overlap computation via time-sharing.
    * Inter-stage switching: find the optimal switch plan with the minimum communication cost.
    * Based on Megatron-LM.
* LLM federated fine-tuning
  * FwdLLM: Efficient Federated Finetuning of Large Language Models with Perturbed Inferences \[[Paper](https://www.usenix.org/conference/atc24/presentation/xu-mengwei)] \[[Code](https://github.com/UbiquitousLearning/FwdLLM)]
    * BUPT
    * Employ backpropagation (BP)-free training methods, requiring devices only to execute “perturbed inferences”; adaptively allocate computational loads across devices to balance between convergence speed and accuracy.
* LLM training
  * Accelerating the Training of Large Language Models using Efficient Activation Rematerialization and Optimal Hybrid Parallelism \[[Paper](https://www.usenix.org/conference/atc24/presentation/yuan)] \[[Code](https://github.com/kwai/Megatron-Kwai/tree/atc24ae/examples/atc24)]
    * Kuaishou
    * The balance between computation and memory utilization.
    * Two activation rematerialization strategies
      * *Pipeline-parallel-aware offloading* to maximize the utilization of host memory for storing activations.
      * *Compute-memory balanced checkpointing* to balance between activation memory and computational efficiency.

### Reliability

* AI Infra
  * SuperBench: Improving Cloud AI Infrastructure Reliability with Proactive Validation \[[Paper](https://www.usenix.org/conference/atc24/presentation/xiong)] \[[Code](https://github.com/microsoft/superbenchmark)]
    * MSR & Microsoft
      * **Best Paper Award**
      * SuperBench, a proactive validation system for AI infrastructure that mitigates *hidden degradation* (i.e., *gray failure*) caused by hardware redundancies and enhances overall reliability.
      * A comprehensive benchmark suite to evaluate individual hardware components and represent most real AI workloads.
* HBM
  * Removing Obstacles before Breaking Through the Memory Wall: A Close Look at HBM Errors in the Field \[[Paper](https://www.usenix.org/conference/atc24/presentation/wu-ronglong)] \[[Code](https://github.com/wrl297/Calchas)]
    * Xiamen University & Huawei & Minjiang University
      * Conduct the first systematical study on HBM errors, which cover over 460 million error events collected from nineteen data centers and span over two years of deployment under a variety of services.
      * Calchas, a hierarchical failure prediction framework for HBM integrates spatial, temporal, and sensor information from various device levels to predict upcoming failures.

### Supercomputer

* Full Lifecycle Data Analysis on a Large-scale and Leadership Supercomputer: What Can We Learn from It?
  * THU & SDU & National Supercomputer Center in Wuxi
  * A comprehensive analysis of six years’ worth of 40 TB data (comprising I/O performance data and job running information) from Sunway TaihuLight, boasting 41508 nodes.
  * **Notice**: The data is currently not available.

### Distributed Training

* Metis: Fast Automatic Distributed Training on Heterogeneous GPUs \[[Paper](https://www.usenix.org/conference/atc24/presentation/um)]
  * Samsung Research & UNIST
  * Metis, a system to automatically finds efficient parallelism plans for distributed training on *heterogeneous GPUs*.
  * Balance loads with heterogeneity-awareness; prefer data parallelism over tensor parallelism within a pipeline stage.
  * Evaluated with three large models (GPT-3, MoE, and Wide-Resnet).

### Data Preprocessing

* Pecan: Cost-Efficient ML Data Preprocessing with Automatic Transformation Ordering and Hybrid Placement \[[Paper](https://www.usenix.org/conference/atc24/presentation/graur)] \[[Code](https://github.com/eth-easl/pecan-experiments)]
  * ETH & Google
  * Dynamically *schedule data preprocessing workers on ML accelerator host resources* to minimize the number of remote CPU workers needed to achieve peak data ingestion bandwidth.
  * Analyze the characteristics of input pipelines and *automatically reorder transformations* to increase data preprocessing worker throughput

### Serverless Computing

* Harmonizing Efficiency and Practicability: Optimizing Resource Utilization in Serverless Computing with Jiagu \[[Paper](https://www.usenix.org/conference/atc24/presentation/liu-qingyuan)]
  * SJTU, IPAPS & Huawei Cloud & EPFL
  * **Jiagu**, a serverless system based on OpenFaaS
    * *Pre-decision scheduling:* decouple prediction and decision-making; predict every function's capacities on a server using a model.
    * *Dual-staged scaling:* frequent adjustment of instances.
* ALPS: An Adaptive Learning, Priority OS Scheduler for Serverless Functions \[[Paper](https://www.usenix.org/conference/atc24/presentation/fu)] \[[Code](https://github.com/ds2-lab/ALPS)]
  * UVA & George Mason University & Adobe Research
  * **ALPS**: **A**daptive **L**earning, **P**riority **S**cheduler
    * Application-aware kernel scheduler
    * Frontend: user-space; approximate *shortest remaining process time* (SRPT) priority scheduling by adaptively learning from an SRPT simulation on recent past workload.
    * Backend: use eBPF functions hooked to CFS to inform scheduling decisions (from the frontend) in the kernel.
* StreamBox: A Lightweight GPU SandBox for Serverless Inference Workflow \[[Paper](https://www.usenix.org/conference/atc24/presentation/wu-hao)] \[[Code](https://github.com/CGCL-codes/streambox)]
  * HUST & INRIA
  * *One GPU runtime per inference workflow* instead of *one GPU runtime per function*.
  * Use CUDA streams for serverless inference; fine-grained GPU memory management; and PCIe bandwidth sharing among concurrent streams.
* A Secure, Fast, and Resource-Efficient Serverless Platform with Function REWIND \[[Paper](https://www.usenix.org/conference/atc24/presentation/song)] \[[Code](https://github.com/s3yonsei/rewind_serverless)]
  * Sungkyunkwan University & Yonsei University & Seoul National University
  * Enhance performance while *maintaining strict data isolation between requests*.
  * The container is reset to an initial state free of any sensitive data after each function request; incorporate a kernel-level memory snapshot management system; optimize runtime by reusing memory regions and leveraging the temporal locality of function executions.

### Model Serving

* Power-aware Deep Learning Model Serving with μ-Serve \[[Paper](https://www.usenix.org/conference/atc24/presentation/qiu)]
  * UIUC & IBM Research
  * Scaling GPU frequency for power saving without SLO attainment violations.

### Cluster Scheduler

* Starburst: A Cost-aware Scheduler for Hybrid Cloud \[[Paper](https://www.usenix.org/conference/atc24/presentation/luo)] \[[Code](https://github.com/michaelzhiluo/starburst)]
  * UC Berkeley & UCSB
  * Distinguished Artifact Award
  * Run the batch workloads on the private clusters or public cloud. -> Trade-off between the cost and JCT.
  * Dynamically control jobs' waiting times to improve utilization.
    * Assign longer waits for large jobs to increase their chances of running on the cluster.
    * Assign shorter waits to small jobs to increase their chances of running on the cloud.

### Deep Learning Compiler

* MAGPY: Compiling Eager Mode DNN Programs by Monitoring Execution States \[[Paper](https://www.usenix.org/conference/atc24/presentation/zhang-chen)] \[[Code](https://github.com/heheda12345/MagPy)]
  * THU
  * Generate more complete operator graphs by collecting key runtime information through monitoring program execution.
  * Provide a reference graph to record program execution states and leverage reference relationships to identify state changes that can impact program outputs.

### Deep Learning Recommendation Models (DLRMs)

* OPER: Optimality-Guided Embedding Table Parallelization for Large-scale Recommendation Model \[[Paper](https://www.usenix.org/conference/atc24/presentation/wang)]
  * UCSD & UCSB & Meta & Pacific Northwest National Laboratory
  * Provide a near-optimal parallelization strategy for embedding tables.

### Probabilistic Graphical Models

* Fast Inference for Probabilistic Graphical Models \[[Paper](https://www.usenix.org/conference/atc24/presentation/jiang)] \[[Code](https://github.com/jjiantong/FastPGM)]
  * University of Western Australia & HKUST
  * **Fast-PGM**: a fast and parallel PGM inference system for importance sampling-based approximate inference algorithms.

### Remote Direct Memory Access (RDMA)

* PeRF: Preemption-enabled RDMA Framework \[[Paper](https://www.usenix.org/conference/atc24/presentation/lee)]
  * Acryl Inc. & Sungkyunkwan University
  * Offer *software-based performance isolation* for efficient *multi-tenancy* in RDMA.

### Remote Procedure Call (RPC)

* HydraRPC: RPC in the CXL Era \[[Paper](https://www.usenix.org/conference/atc24/presentation/ma)]
  * Alibaba & THU & ZJU & PKU
  * Utilize CXL-attached HDM to build RPC systems.

### Journaling File System

* FastCommit: Resource-Efficient, Performant and Cost-Effective File System Journaling
  * Google
  * **Best Paper Award**

### Rust-for-Linux

* An Empirical Study of Rust-for-Linux: The Success, Dissatisfaction, and Compromise \[[Paper](https://www.usenix.org/conference/atc24/presentation/li-hongyu)] \[[Code](https://github.com/Richardhongyu/rfl_empirical_tools)]
  * BUPT & UESTC


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://paper.lingyunyang.com/reading-notes/conference/atc-2024.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
