EVStore: Storage and caching capabilities for scaling embedding tables in deep recommendation system
#deep_learning_recommender_system #embedding_lookup #recommendation_inference #cache
Meta Info
Presented in ASPLOS 2023.
Authors: Daniar H. Kurniawan, Ruipu Wang, Kahfi S. Zulkifli, Fandi A. Wiranata, John Bent, Ymir Vigfusson, Haryadi S. Gunawi.
Code: https://github.com/ucare-uchicago/ev-store-dlrm
Understanding the paper
Challenge
Each recommendation inference requires multiple EV table lookups, but if any memory access is slow, the whole inference request is slow.
Limitation of existing work
Open-source DLRMs such as Facebook DLRM, store the full embedding tables in DRAM and lack support for responding to lookups from backend storage when memory is exhausted.
Key designs
Propose EVStore, add a caching layer within DLRS; optimize for access patterns.
Three layers
EVCache (L1)
Extend various cache replacement algorithms.
EVMix (L2)
Store lower precision embedding (e.g., fp8).
EVProx (L3)
A key-to-key caching layer that maps a key to a surrogate key with a similar embedding value.
Key mapping built in an offline preprocessing manner; adopt the statistical measures of Euclidean and cosine distances to calculate similarity.
In general, the remapping should be done when L3 hit rate drops significantly.
Implementation
Integrated within Facebook DLRM.
Evaluation
Use the Criteo 1TB Click Logs dataset.
13 dense integer features and 26 sparse categorical features (26 EV tables)
All EV tables have the same embedding dimensions of 36
156 billion total feature values and over 800 million unique attribute values
Last updated