EVStore: Storage and caching capabilities for scaling embedding tables in deep recommendation system
#deep_learning_recommender_system #embedding_lookup #recommendation_inference #cache
Last updated
Was this helpful?
#deep_learning_recommender_system #embedding_lookup #recommendation_inference #cache
Last updated
Was this helpful?
Presented in .
Authors: Daniar H. Kurniawan, Ruipu Wang, Kahfi S. Zulkifli, Fandi A. Wiranata, John Bent, Ymir Vigfusson, Haryadi S. Gunawi.
Code:
Each recommendation inference requires multiple EV table lookups, but if any memory access is slow, the whole inference request is slow.
Open-source DLRMs such as , store the full embedding tables in DRAM and lack support for responding to lookups from backend storage when memory is exhausted.
Propose EVStore, add a caching layer within DLRS; optimize for access patterns.
Three layers
EVCache (L1)
Extend various cache replacement algorithms.
EVMix (L2)
Store lower precision embedding (e.g., fp8).
EVProx (L3)
A key-to-key caching layer that maps a key to a surrogate key with a similar embedding value.
Key mapping built in an offline preprocessing manner; adopt the statistical measures of Euclidean and cosine distances to calculate similarity.
In general, the remapping should be done when L3 hit rate drops significantly.
Integrated within Facebook DLRM.
Use the Criteo 1TB Click Logs dataset.
13 dense integer features and 26 sparse categorical features (26 EV tables)
All EV tables have the same embedding dimensions of 36
156 billion total feature values and over 800 million unique attribute values