A case for disaggregation of ML data processing
Metadata
Presented in arxiv:2210.14826.
Authors: Andrew Audibert, Yang Chen, Dan Graur, Ana Klimovic, Jiri Simsa, Chandramohan A Thekkath (Google & ETH Zurich)
Understanding the paper
TL;DR
This paper present tf.data service, which is a disaggregated input data processing service built on top of tf.data.
Features
- Disaggregate data preprocessing from ML computation. 
- Horizontal scaling. 
- Data sharing to reuse computation (e.g., hyperparameter tuning jobs). 
- Coordinated data reads. 
Evaluation
- Speedups: up to 100x. 
- Reduce job cost: up to 89x. 
Last updated
Was this helpful?