VLDB 2024
Last updated
Was this helpful?
Last updated
Was this helpful?
Homepage:
DL training workloads
Saturn: An Optimized Data System for Multi-Large-Model Deep Learning Workloads [] []
UCSD
Saturn -> SPASE: Select a Parallelism, Allocate resources, and Schedule; formulate the joint SPASE problem as an MILP.
Big data analytic workloads
Intelligent Pooling: Proactive Resource Provisioning in Large-scale Cloud Service []
Microsoft
Predict usage patterns using a hybrid ML model; optimize the pool size dynamically.
Job scheduling
ResLake: Towards Minimum Job Latency and Balanced Resource Utilization in Geo-distributed Job Scheduling
ByteDance
Autoscaling
OptScaler: A Collaborative Framework for Robust Autoscaling in the Cloud []
Ant Group
Serverless
Resource Management in Aurora Serverless []
AWS
Industry Paper
Approximate Inference
CUHK
Approximate input features to accelerate inference pipelines.
Trade-off between latency and accuracy.
Evaluation: All inference pipelines were implemented using Python and scikit-learn; run on CPU servers.
Hasso Plattner Institute & University of Potsdam & University of Illinois Chicago
Approximate ML inference pipelines using index structures available in DBMS.
Predictions are preserved in the embedding space; select binned features for indexing.
IMO: Aggressive...
Edge Computing
ZJU & Alibaba
SmartLite, a lightweight DBMS
Store the parameters and structural information of neural networks as database tables.
Implement neural network operators inside the DBMS engine.
Quantize model parameters as binarized values, apply neural pruning techniques to compress the models, and transform tensor manipulations into value lookup operations of the DBMS.
UIUC & UMich
Live migration via checkpointing/restoration.
Reconstruct all variables from a subset of variables.
UC Berkeley
Limitations of existing works
Naively apply a one-size-fits-all policy as to when/how to update these features.
Do not consider query access patterns or impacts on prediction accuracy.
Feature store regret: a metric for how much featurization degrades downstream accuracy.
Leverage downstream error feedback to minimize feature store regret.
UNIST
Cooperatively utilizes both CPUs and GPUs to accelerate the data preprocessing stage of DL training that runs the data augmentation algorithm.
Orchestrate data preprocessing tasks across CPUs and GPUs while minimizing interference with GPU-based model training.
AntGroup & Sichuan University
UIUC & NVIDIA
GIDS: GPU Initiated Direct Storage Access -> A data loader to utilize all hardware resources (i.e., CPU memory, storage, and GPU memory)
Biathlon: Harnessing Model Resilience for Accelerating ML Inference Pipelines [] [] []
InferDB: In-Database Machine Learning Inference Using Indexes []
SmartLite: A DBMS-based Serving System for DNN Inference in Resource-constrained Environments [] []
ElasticNotebook: Enabling Live Migration for Computational Notebooks [] [] []
RALF: Accuracy-Aware Scheduling for Feature Store Maintenance []
FusionFlow: Accelerating Data Preprocessing for Machine Learning with CPU-GPU Cooperation [] []
DLRover: Resource Optimization for Deep Recommendation Models Training at AntGroup [] []
Accelerating Sampling and Aggregation Operations in GNN Frameworks with GPU Initiated Direct Storage Accesses [] []