Data Processing
Last updated
Was this helpful?
Last updated
Was this helpful?
Pecan: Cost-Efficient ML Data Preprocessing with Automatic Transformation Ordering and Hybrid Placement () [] []
ETH & Google
Disaggregating ML Input Data Processing at Scale ()
Google & ETH
GoldMiner: Elastic Scaling of Training Data Pre-Processing Pipelines for Deep Learning (SIGMOD 2023) []
Alibaba & PKU
A case for disaggregation of ML data processing (arXiv 2210.14826) []
Google & ETH
tf.data service: Disaggregate data preprocessing from ML computation.
Understanding Data Storage and Ingestion for Large-Scale Deep Recommendation Model Training (ISCA 2022) []
Meta
DSI: Data storage and ingestion
Industry track
Meta's data storage and ingestion pipeline