CtrlK

A case for disaggregation of ML data processing

Metadata

Presented in arxiv:2210.14826.

Authors: Andrew Audibert, Yang Chen, Dan Graur, Ana Klimovic, Jiri Simsa, Chandramohan A Thekkath (Google & ETH Zurich)

Understanding the paper

TL;DR

This paper present tf.data service, which is a disaggregated input data processing service built on top of tf.data.

Features

Disaggregate data preprocessing from ML computation.
Horizontal scaling.
Data sharing to reuse computation (e.g., hyperparameter tuning jobs).
Coordinated data reads.

Evaluation

Speedups: up to 100x.
Reduce job cost: up to 89x.

Last updated 2 years ago

Was this helpful?