Remote Direct Memory Access (RDMA)
General RDMA
X-RDMA: Effective RDMA Middleware in Large-scale Production Environments (CLUSTER 2019) [Paper]
Alibaba
Focus on robustness, scalability, and maintainability.
Revisiting Network Support for RDMA (SIGCOMM 2018) [Personal Notes] [Paper]
UC Berkeley & ICSI & Mellanox & NYU & UW
IRN: Better handling of packet losses; eliminate the need for PFC.
RDMA over Commodity Ethernet at Scale (SIGCOMM 2016) [Paper]
Microsoft
Challenges using RoCEv2; a DSCP (Differentiated Services Code Point) based PFC mechanism.
Congestion Control for Large-Scale RDMA Deployments (SIGCOMM 2015) [Paper]
Microsoft & Mellanox & UCSB
DCQCN: A congestion control scheme for RoCEv2, to alleviate the problems of PFC.
RDMA for Deep Learning
Fast Distributed Deep Learning over RDMA (EuroSys 2019) [Paper]
MSRA
RDMA for Storage
Performance Isolation
Understanding RDMA Microarchitecture Resources for Performance Isolation (NSDI 2023) [Personal Notes] [Paper] [Benchmark Suite]
Duke & Microsoft & SJTU
Develop a test suite to evaluate RDMA performance isolation solutions.
Acronyms
PFC: Priority Flow Control
RoCE: RDMA over Converged Ethernet
IBoE: InfiniBand over Ethernet
Last updated