A case for disaggregation of ML data processing
Metadata
Presented in arxiv:2210.14826.
Authors: Andrew Audibert, Yang Chen, Dan Graur, Ana Klimovic, Jiri Simsa, Chandramohan A Thekkath (Google & ETH Zurich)
Understanding the paper
TL;DR
This paper present tf.data service, which is a disaggregated input data processing service built on top of tf.data.
Features
Disaggregate data preprocessing from ML computation.
Horizontal scaling.
Data sharing to reuse computation (e.g., hyperparameter tuning jobs).
Coordinated data reads.
Evaluation
Speedups: up to 100x.
Reduce job cost: up to 89x.
Last updated