> For the complete documentation index, see [llms.txt](https://paper.lingyunyang.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://paper.lingyunyang.com/reading-notes/conference/hotos-2025.md).

# HotOS 2025

## Meta Info

Homepage: <https://sigops.org/s/conferences/hotos/2025/index.html>

Paper list: <https://sigops.org/s/conferences/hotos/2025/program.html>

## Papers

### AI Infrastructure

* Good things come in small packages: Should we build AI clusters with Lite-GPUs? \[[arXiv](https://arxiv.org/abs/2501.10187)]
  * MSR
* Storage Class Memory is Dead, All Hail Managed-Retention Memory: Rethinking Memory for the AI Era \[[arXiv](https://arxiv.org/abs/2501.09605)]
  * MSR
  * **MRM**: Managed-Retention Memory

### Resource Management

* Granular Resource Demand Heterogeneity \[[Paper](https://seojinpark.net/downloads/hotos25-hiresperf.pdf)]
  * USC
  * **hiresperf**: a granular resource profiler that investigates resource usage at *10-microsecond intervals* attributed to each function invocation with a low overhead

### Compound AI Systems

* Towards Resource-Efficient Compound AI Systems \[[arXiv](https://arxiv.org/abs/2501.16634)]
  * MIT & Microsoft Azure

### Operating Systems

* Apiary: An OS for the Modern FPGA
  * UW
* The NIC should be part of the OS
  * ETH

### Tiered Storage

* Tolerate It if You Cannot Reduce It: Handling Latency in Tiered Memory
  * EPFL
* My CXL Pool Obviates Your PCIe Switch \[[arXiv](https://arxiv.org/abs/2503.23611)]
  * Columbia & Microsoft
* Rethinking Tiered Storage: Talk to File Systems, Not Device Drivers
  * UIUC

### Remote Procedure Call (RPC)

* Rethinking RPC Communication for Microservices-based Applications \[[Paper](https://danyangzhuo.com/papers/HotOS25-RPC.pdf)]
  * UW

### AI Security

* Guillotine: Hypervisors for Isolating Malicious AIs \[[arXiv](https://arxiv.org/abs/2504.15499)]
  * Harvard & Princeton

### Verification

* Lightweight Hypervisor Verification: Putting the Hardware Burger on a Diet
  * EPFL
* Can Large Language Models Verify System Software? A Case Study Using FSCQ as a Benchmark
  * Duke
* Modular, Full-System Verification
  * BlueRock Security

### Shared Log

* Designing a Datacenter-wide Distributed Shared Log
  * UC Berkeley

### Unclassified

* Towards ML System Extensibility
  * UW
* Serve Programs, Not Prompts
  * Yale