# Direct access, high-performance memory disaggregation with DirectCXL

## Metadata

Presented in [ATC 2022](https://www.usenix.org/conference/atc22/presentation/gouk).

Authors: Donghyun Gouk, Sangwon Lee, Miryeong Kwon, Myoungsoo Jung (*KAIST*)

## Understanding the paper

### TL;DRs

The first work that brings CXL 2.0 into a real system and analyzes the performance characteristics of CXL-enabled disaggregated memory design.

### Existing works

* Two different approaches based on **how they manage data between a host and memory server(s)**
  * **Page-based**: utilize virtual memory techniques to use disaggregated memory without a code change; intercept paging requests when there is a page fault; swap the data to a remote memory node instead of the underlying storage.
  * **Object-based**: handle disaggregated memory from a remote using their own database (e.g., key-value store); directly intervene in RDMA data transfers; require significant source-level modifications and interface changes.
* All the existing approaches need to **move data from the remote memory to the host memory over RDMA** (or similar fine-grain network interfaces); data movement and its accompanying operations (e.g., page cache management) introduce **redundant memory copies and software fabric intervention**.

### Contributions

* Disaggregate memory over CXL and integrate the disaggregated memory into processor-side system memory.
  * Implement **CXL controller** that employs multiple DRAM modules on a remote side.
  * Implement **CXL software runtime** that allows users to utilize the underlying disaggregated memory resources through sheer load/store instructions.
* Prototype DirectCXL using many customized memory add-in-cards, 16nm FPGA-based processor nodes, a switch, and a PCIe backplane.

### CXL vs. RDMA

* RDMA-based: all DRAM modules and their interfaces are designed as passive peripherals; require the control computing resources at the remote side.
* CXL-based: allow the host computing resources directly access the underlying memory through PCIe buses.

### Performance evaluation

* Compared to RDMA-based memory disaggregation, 6.2x shorter latency & 3x better performance.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://paper.lingyunyang.com/reading-notes/conference/atc-2022/directcxl.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
