# Deep Learning Recommendation Model (DLRM)

## DLRM Training

* Heterogeneous Acceleration Pipeline for Recommendation System Training ([ISCA 2024](/reading-notes/conference/isca-2024.md)) \[[arXiv](https://arxiv.org/abs/2204.05436)]
  * UBC & GaTech
  * **Hotline**: a runtime framework.
  * Utilize CPU main memory for non-popular embeddings and GPUs’ HBM for popular embeddings.
  * Fragment a mini-batch into popular and non-popular micro-batches (μ-batches).
* Accelerating Neural Recommendation Training with Embedding Scheduling ([NSDI 2024](/reading-notes/conference/nsdi-2024.md)) \[[Paper](https://www.usenix.org/conference/nsdi24/presentation/zeng)] \[[Slides](https://www.usenix.org/system/files/nsdi24_slides-zeng.pdf)] \[[Code](https://github.com/HKUST-SING/herald)]
  * HKUST
  * **Herald**: an adaptive location-aware inputs allocator to determine *where embeddings should be trained* and an optimal communication plan generator to determine *which embeddings should be synchronized*.
* Bagpipe: Accelerating Deep Recommendation Model Training ([SOSP 2023](/reading-notes/conference/sosp-2023.md)) \[[Paper](https://dl.acm.org/doi/abs/10.1145/3600006.3613142)]
  * UW-Madison & UChicago

## DLRM Inference

* DisaggRec: Architecting Disaggregated Systems for Large-Scale Personalized Recommendation (arXiv 2212.00939) \[[Personal Notes](/reading-notes/miscellaneous/arxiv/2022/disaggrec.md)] \[[Paper](https://arxiv.org/abs/2212.00939)]
  * Meta AI & WashU & UPenn & Cornell & Intel
  * *Disaggregated* system; *decouple* CPUs and memory resources; *partition embedding tables*.

## Pruning

* AdaEmbed: Adaptive Embedding for Large-Scale Recommendation Models ([OSDI 2023](/reading-notes/conference/osdi-2023.md)) \[[Paper](https://www.usenix.org/conference/osdi23/presentation/lai)]
  * UMich SymbioticLab & Meta
  * In-training pruning.

## GPU Cache

* UGache: A Unified GPU Cache for Embedding-based Deep Learning ([SOSP 2023](/reading-notes/conference/sosp-2023.md)) \[[Personal Notes](/reading-notes/conference/sosp-2023/ugache.md)] \[[Paper](https://dl.acm.org/doi/10.1145/3600006.3613169)]
  * SJTU
  * A *unified multi-GPU cache* system.
  * Used for GNN training and DLR inference.
* EVStore: Storage and Caching Capabilities for Scaling Embedding Tables in Deep Recommendation Systems ([ASPLOS 2023](/reading-notes/conference/asplos-2023.md)) \[[Personal Notes](/reading-notes/conference/asplos-2023/evstore.md)] \[[Paper](https://dl.acm.org/doi/10.1145/3575693.3575718)] \[[Code](https://github.com/ucare-uchicago/ev-store-dlrm)]
  * UChicago & Beijing University of Technology & Bandung Institute of Technology, Indonesia & Seagate Technology & Emory
  * A *caching* layer optimized for embedding *access patterns*.

## Model Update

* Ekko: A Large-Scale Deep Learning Recommender System with Low-Latency Model Update ([OSDI 2022](/reading-notes/conference/osdi-2022.md)) \[[Paper](https://www.usenix.org/conference/osdi22/presentation/sima)]
  * Tencent & Edinburgh
  * P2P model update dissemination.

## Acronyms

* DLRM: Deep Learning Recommendation Model


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://paper.lingyunyang.com/paper-list/systems-for-ml/dlrm.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
