Data Processing
Disaggregating ML Input Data Processing at Scale (SoCC 2023)
Google & ETH
GoldMiner: Elastic Scaling of Training Data Pre-Processing Pipelines for Deep Learning (SIGMOD 2023) [Paper]
Alibaba & PKU
A case for disaggregation of ML data processing (arXiv 2210.14826) [Paper]
Google & ETH
tf.data service: Disaggregate data preprocessing from ML computation.
Understanding Data Storage and Ingestion for Large-Scale Deep Recommendation Model Training (ISCA 2022) [Paper]
Meta
DSI: Data storage and ingestion
Industry track
Meta's data storage and ingestion pipeline
Last updated