📜
Awesome Papers
  • Introduction
  • Paper List
    • Systems for ML
      • Data Processing
      • Deep Learning Training
      • Resource Scheduler
      • Model Serving
      • Large Language Model (LLM)
      • Diffusion Models
      • Deep Learning Recommendation Model (DLRM)
      • Mixture of Experts (MoE)
      • Hyper-Parameter Tuning (HPO)
      • Reinforcement Learning (RL)
      • Deep Learning Compiler
      • Deep Learning Framework
      • Cloud-Edge Collaboration
    • ML for Systems
    • Artificial Intelligence (AI)
      • Diffusion Models
      • Language Models
      • Deep Learning Recommendation Model (DLRM)
    • Hardware Virtualization
      • GPU Sharing
    • Resource Disaggregation
      • GPU Disaggregation
      • Memory Disaggregation
    • Resource Fragmentation
    • Cloud Computing
      • Sky Computing
      • Serverless Computing
      • Spot Instances
    • Remote Direct Memory Access (RDMA)
    • Research Skills
    • Miscellaneous
  • Reading Notes
    • Conference
      • ICML 2025
      • ATC 2025
      • OSDI 2025
      • HotOS 2025
      • MLSys 2025
      • NSDI 2025
      • ASPLOS 2025
      • EuroSys 2025
      • HPCA 2025
      • PPoPP 2025
      • NeurIPS 2024
      • SoCC 2024
      • HotNets 2024
      • SC 2024
      • SOSP 2024
      • VLDB 2024
      • SIGCOMM 2024
      • ICML 2024
      • ATC 2024
      • OSDI 2024
      • ISCA 2024
      • CVPR 2024
      • MLSys 2024
      • ASPLOS 2024
        • SpotServe: Serving generative large language models on preemptible instances
      • EuroSys 2024
        • Orion: Interference-aware, fine-grained GPU sharing for ML applications
      • NSDI 2024
      • NeurIPS 2023
      • SC 2023
        • Interference-aware multiplexing for deep learning in GPU clusters: A middleware approach
      • SoCC 2023
      • SOSP 2023
        • UGache: A unified GPU cache for embedding-based deep learning
      • SIGCOMM 2023
      • HotChips 2023
      • ICML 2023
      • ATC 2023
        • Accelerating Distributed MoE Training and Inference with Lina
        • SmartMoE: Efficiently Training Sparsely-Activated Models ...
        • Beware of Fragmentation: Scheduling GPU-Sharing Workloads with Fragmentation Gradient Descent
      • OSDI 2023
      • HotOS 2023
      • SIGMOD 2023
      • ISCA 2023
      • MLSys 2023
      • EuroSys 2023
      • NSDI 2023
        • Shepherd: Serving DNNs in the wild
        • Understanding RDMA microarchitecture resources for performance isolation
        • Skyplane: Optimizing transfer cost and throughput using cloud-aware overlays
        • Shockwave: Fair and efficient cluster scheduling for dynamic adaptation in machine learning
      • ASPLOS 2023
        • TPP: Transparent page placement for CXL-enabled tiered-memory
        • EVStore: Storage and caching capabilities for scaling embedding tables in deep recommendation system
        • Lucid: A non-intrusive, scalable and interpretable scheduler for deep learning training jobs
      • SC 2022
      • SoCC 2022
        • ESCHER: Expressive scheduling with ephemeral resources
        • Serving unseen deep learning model with near-optimal configurations: A fast adaptive search approach
      • SIGCOMM 2022
        • Multi-resource interleaving for deep learning training
      • ATC 2022
        • PilotFish: Harvesting Free Cycles of Cloud Gaming with Deep Learning Training
        • Memory Harvesting in Multi-GPU Systems with Hierarchical Unified Virtual Memory
        • Whale: Efficient Giant Model Training over Heterogeneous GPUs
        • DVABatch: Diversity-aware Multi-Entry Multi-Exit Batching for Efficient Processing of DNN Service...
        • Serving Heterogeneous Machine Learning Models on Multi-GPU Servers with Spatio-Temporal Sharing
        • SOTER: Guarding Black-box Inference for General Neural Networks at the Edge
        • Direct access, high-performance memory disaggregation with DirectCXL
      • OSDI 2022
        • Orca: A distributed serving system for transformer-based generative models
        • Microsecond-scale preemption for concurrent GPU-accelerated DNN inferences
        • Looking beyond GPUs for DNN scheduling on multi-tenant clusters
      • IPDPS 2022
        • DGSF: Disaggregated GPUs for serverless functions
      • EuroSys 2022
        • Slashing the disaggregation tax in heterogeneous data centers with FractOS
      • NSDI 2022
      • SoCC 2021
      • ATC 2021
        • Zico: Efficient GPU memory sharing for concurrent DNN training
      • OSDI 2021
        • Pollux: Co-adaptive cluster scheduling for goodput-optimized deep learning
      • SOSP 2021
        • HeMem: Scalable Tiered Memory Management for Big Data Applications and Real NVM
      • EuroSys 2021
        • Take it to the limit: Peak prediction-driven resource overcommitment in datacenters
      • HotOS 2021
        • From cloud computing to sky computing
      • NSDI 2021
      • OSDI 2020
        • A unified architecture for accelerating distributed DNN training in heterogeneous GPU/CPU clusters
        • HiveD: Sharing a GPU cluster for deep learning with guarantees
      • ATC 2020
        • Serverless in the wild: Characterizing and optimizing the serverless workload
      • EuroSys 2020
      • ASPLOS 2020
      • MLSys 2020
      • SoCC 2020
        • Elastic Parameter Server Load Distribution in Deep Learning Clusters
      • HPDC 2020
        • KubeShare: A framework to manage GPUs as first-class and shared resources in container cloud
      • CLUSTER 2019
      • EuroSys 2019
      • NSDI 2019
      • IWQoS 2019
        • Who limits the resource efficiency of my datacenter: An analysis of Alibaba datacenter traces
      • SIGCOMM 2018
        • Revisiting network support for RDMA
      • OSDI 2018
        • Ray: A distributed framework for emerging AI applications
      • EuroSys 2018
        • Medea: Scheduling of long running applications in shared production clusters
      • ISPA/IUCC/BDCloud/SocialCom/SustainCom 2018
        • GaiaGPU: Sharing GPUs in container clouds
      • SoCC 2017
        • SLAQ: Quality-driven scheduling for distributed machine learning
      • ASPLOS 2017
        • Neurosurgeon: Collaborative intelligence between the cloud and mobile edge
      • NSDI 2017
        • Clipper: A low-latency online prediction serving system
      • CLUSTER 2014
        • Evaluating job packing in warehouse-scale computing
    • Journal
      • IEEE Transactions on Cloud Computing
        • 2021
          • Gemini: Enabling multi-tenant GPU sharing based on kernel burst estimation
      • ACM Computing Surveys
        • 2017
          • GPU virtualization and scheduling methods: A comprehensive survey
      • ACM SIGCOMM Computer Communication Review (CCR)
        • 2021
          • Data-driven Networking Research: models for academic collaboration with industry
        • 2007
          • How to Read a Paper
      • Communications of the ACM
        • 2015
          • Why Google stores billions of lines of code in a single repository
    • Miscellaneous
      • arXiv
        • 2024
          • Efficiently programming large language models using SGLang
        • 2023
          • HexGen: Generative inference of foundation model over heterogeneous decentralized environment
          • High-throughput generative inference of large language models with a single GPU
        • 2022
          • DisaggRec: Architecting disaggregated systems for large-scale personalized recommendation
          • A case for disaggregation of ML data processing
          • Singularity: Planet-scale, preemptive and elastic scheduling of AI workloads
          • Aryl: An elastic cluster scheduler for deep learning
        • 2016
          • Wide & deep learning for recommender systems
          • Training deep nets with sublinear memory cost
      • MSR Technical Report
        • 2011
          • Heuristics for vector bin packing
  • About Myself
    • Academic Profile
    • Personal Blog (in Chinese)
Powered by GitBook
On this page
  • Meta Info
  • Papers
  • Resource Allocation
  • Machine Learning
  • Serverless Computing
  • Sustainable Computing

Was this helpful?

Edit on GitHub
  1. Reading Notes
  2. Conference

SoCC 2023

Last updated 10 months ago

Was this helpful?

Meta Info

Homepage:

Paper list:

Papers

Resource Allocation

  • Lifting the Fog of Uncertainties: Dynamic Resource Orchestration for the Containerized Cloud []

    • UofT

    • Adaptively configure resource parameters

    • Built on contextual bandit techniques

    • Balance between performance and resource cost

  • Not All Resources are Visible: Exploiting Fragmented Shadow Resources in Shared-State Scheduler Architecture []

    • SJTU & Huawei

    • Shared-state schedulers: A central state view periodically updates the global cluster status to distributed schedulers

    • Shadow resources: Resources invisible to shared-state schedulers until the next view update

    • Resource Miner (RMiner) includes a shadow resource manager to manage shadow resources, an RM filter to select suitable tasks as RM tasks, an RM scheduler to allocate shadow resources to RM tasks

  • Gödel: Unified large-scale resource management and scheduling at ByteDance []

    • ByteDance & UVA

    • Industry Paper

    • A unified infrastructure for all business groups to run their diverse workloads

    • Built upon Kubernetes

Machine Learning

    • Microsoft Research & UW

    • Schedule based on predictions of future job arrivals and durations

    • Deal with prediction errors

    • Google & ETH

    • Industry Paper

    • A disaggregated input data processing service built on top of tf.data in TensorFlow

    • Horizontally scale out to right-size host resources (CPU/RAM) for data processing in each job

    • Share ephemeral preprocessed data results across jobs

    • Coordinated reads to avoid stragglers

    • IMDEA Software Institute

    • Vision Paper

    • Question: Whether complex machine learning models are necessary to use?

    • Proposal: Practical memory management systems need to first identify the extent to which simple solutions can be effective.

Serverless Computing

    • HKUST & WeBank

    • Best Paper Award!

    • A scheduling system for serverless functions to minimize resource provisioning costs while meeting the function latency requirements

    • Overcommit functions based on their past resource usage; Identify nine low-level metrics (e.g., request load, resource allocation, contention on shared resources); Use the Mondrian Forest to predict the function performance

    • Employ a conservative exploration-exploitation strategy for request routing; By default, route requests to non-overcommitted instances; Explore to use overcommitted instances

    • Vertical scaling to dynamically adjust the concurrency of overcommitted instances

    • UBC & UTokyo & INSAT

    • Find optimal configurations through an online learning process

    • Use parametric regression to choose the right memory configurations for serverless functions

    • HUST & Huawei & Peng Cheng Laboratory

    • Problem: The time-consuming and resource-hungry model-loading process when scaling out function instances

    • Observation: The sensitivity of each layer to the computing resources is mostly anti-correlated with its memory resource usage

    • Asymmetric Functions

      • The original Body Function loads a complete model to meet stable demands

      • The proposed lightweight Shadow Function only loads a portion of resource-sensitive layers to deal with sudden demands effortlessly

    • AsyFunc — an inference serving system with an auto-scaling and scheduling engine; Built on top of Knative

    • ISCAS & ICT, CAS

    • Asynchronous State Replication Pipelines (ASRP) to speed up serverless workflows for general applications

    • Three insights

      • Provide differentiable data types (DDT) at the programming model level to support incremental state sharing and computation

      • Continuously deliver changes of DDT objects in real-time

      • Direct communication and change propagation

    • Built atop OpenFaaS

    • Huawei

    • Industry Paper

    • Two new serverless traces in Huawei Cloud

      • The first trace: Huawei's internal workloads; Per-second statistics for 200 functions

      • The second trace: Huawei's public FaaS platform; Per-minute arrival rates for over 5000 functions

    • Characterize resource consumption, cold-start times, programming languages used, periodicity, per-second versus per-minute burstiness, correlations, and popularity.

    • Findings

      • Requests vary by up to 9 orders of magnitude across functions, with some functions executed over 1 billion times per day

      • Scheduling time, execution time and cold-start distributions vary across 2 to 4 orders of magnitude and have very long tails

      • Function invocation counts demonstrate strong periodicity for many individual functions and on an aggregate level

    • The need for further research in estimating resource reservations and time-series prediction

    • ETH

    • Vision Paper

    • Dandelion -- a clean state FaaS system; Treat serverless functions as pure functions; Explicitly separate computation and I/O; Hardware acceleration; Enable dataflow-aware function orchestration

    • SJTU & Huawei Cloud

    • Vision Paper

    • Five open challenges

      • Optimize cold start latency: Most existing works only consider synchronous starts; Asynchronous start in Industry

      • Declarative approach: Whether Kubernetes is the right system for serverless computing?

      • Scheduling cost

      • Balance different scheduling policies within a serverless system

      • Costs of sidecar

Sustainable Computing

    • MIT & NEU

    • Significant decreases in both temperature and power draw, reducing power consumption and potentially improving hardware life-span, with minimal impact on job performance

Anticipatory Resource Allocation for ML Training Clusters []

tf.data service: A Case for Disaggregating ML Input Data Processing []

Is Machine Learning Necessary for Cloud Resource Usage Forecasting? []

Golgi: Performance-Aware, Resource-Efficient Function Scheduling for Serverless Computing []

Parrotfish: Parametric Regression for Optimizing Serverless Functions []

AsyFunc: A High-Performance and Resource-Efficient Serverless Inference System via Asymmetric Functions [] []

Chitu: Accelerating Serverless Workflows with Asynchronous State Replication Pipeline [] []

How Does It Function? Characterizing Long-term Trends in Production Serverless Workloads [] []

Function as a Function []

The Gap Between Serverless Research and Real-world Systems []

Sustainable Supercomputing for AI: GPU Power Capping at HPC Scale []

https://acmsocc.org/2023/
https://acmsocc.org/2023/accepted-papers.html
Paper
Paper
Paper
Paper
Paper
Paper
Paper
Paper
Paper
Code
Paper
Code
Paper
Trace
Paper
Paper
Paper