A case for disaggregation of ML data processing
Last updated
Was this helpful?
Last updated
Was this helpful?
Presented in .
Authors: Andrew Audibert, Yang Chen, Dan Graur, Ana Klimovic, Jiri Simsa, Chandramohan A Thekkath (Google & ETH Zurich)
This paper present tf.data service, which is a disaggregated input data processing service built on top of tf.data.
Disaggregate data preprocessing from ML computation.
Horizontal scaling.
Data sharing to reuse computation (e.g., hyperparameter tuning jobs).
Coordinated data reads.
Speedups: up to 100x.
Reduce job cost: up to 89x.