# EVStore: Storage and caching capabilities for scaling embedding tables in deep recommendation system

## Meta Info

Presented in [ASPLOS 2023](https://doi.org/10.1145/3575693.3575718).

Authors: Daniar H. Kurniawan, Ruipu Wang, Kahfi S. Zulkifli, Fandi A. Wiranata, John Bent, Ymir Vigfusson, Haryadi S. Gunawi.

Code: <https://github.com/ucare-uchicago/ev-store-dlrm>

## Understanding the paper

### Challenge

Each recommendation inference requires *multiple EV table lookups*, but if any memory access is slow, the whole inference request is slow.

### Limitation of existing work

Open-source DLRMs such as [Facebook DLRM](https://github.com/facebookresearch/dlrm), store the full embedding tables in DRAM and lack support for responding to lookups from backend storage when memory is exhausted.

### Key designs

* Propose EVStore, add *a caching layer* within DLRS; optimize for access patterns.
* Three layers
  * EVCache (L1)
    * Extend various cache replacement algorithms.
  * EVMix (L2)
    * Store lower precision embedding (e.g., fp8).
  * EVProx (L3)
    * A *key-to-key* caching layer that maps a key to a surrogate key with a similar embedding value.
    * Key mapping built in an *offline* preprocessing manner; adopt the statistical measures of *Euclidean and cosine distances* to calculate similarity.
    * In general, the remapping should be done when L3 hit rate drops significantly.

### Implementation

Integrated within Facebook DLRM.

### Evaluation

* Use the Criteo 1TB Click Logs dataset.
  * 13 dense integer features and 26 sparse categorical features (26 EV tables)
  * All EV tables have the same embedding dimensions of 36
  * 156 billion total feature values and over 800 million unique attribute values


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://paper.lingyunyang.com/reading-notes/conference/asplos-2023/evstore.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
