Direct access, high-performance memory disaggregation with DirectCXL

Metadata

Presented in ATC 2022.

Authors: Donghyun Gouk, Sangwon Lee, Miryeong Kwon, Myoungsoo Jung (KAIST)

The first work that brings CXL 2.0 into a real system and analyzes the performance characteristics of CXL-enabled disaggregated memory design.

Two different approaches based on how they manage data between a host and memory server(s)
- Page-based: utilize virtual memory techniques to use disaggregated memory without a code change; intercept paging requests when there is a page fault; swap the data to a remote memory node instead of the underlying storage.
- Object-based: handle disaggregated memory from a remote using their own database (e.g., key-value store); directly intervene in RDMA data transfers; require significant source-level modifications and interface changes.
All the existing approaches need to move data from the remote memory to the host memory over RDMA (or similar fine-grain network interfaces); data movement and its accompanying operations (e.g., page cache management) introduce redundant memory copies and software fabric intervention.

Disaggregate memory over CXL and integrate the disaggregated memory into processor-side system memory.
- Implement CXL controller that employs multiple DRAM modules on a remote side.
- Implement CXL software runtime that allows users to utilize the underlying disaggregated memory resources through sheer load/store instructions.
Prototype DirectCXL using many customized memory add-in-cards, 16nm FPGA-based processor nodes, a switch, and a PCIe backplane.

RDMA-based: all DRAM modules and their interfaces are designed as passive peripherals; require the control computing resources at the remote side.
CXL-based: allow the host computing resources directly access the underlying memory through PCIe buses.

Compared to RDMA-based memory disaggregation, 6.2x shorter latency & 3x better performance.

Last updated 2 years ago

Was this helpful?