ESCHER: Expressive scheduling with ephemeral resources

#ephemeral_resources #scheduling_flexibility #scheduling_requirements #Kubernetes #Ray

Meta Info

Presented in SoCC 2022.

Authors: Romil Bhardwaj (UC Berkeley), Alexey Tumanov (Georgia Tech), Stephanie Wang (UC Berkeley), Richard Liaw, Philipp Moritz, Robert Nishihara (Anyscale), Ion Stoica (UC Berkeley).

Understanding the paper

  • Goal: support custom scheduling constraints

  • Evolvability

    • Monolithic schedulers (Kubernetes, YARN)

      • Applications state resource requirements

      • Scheduler provides a fixed set of supported policies (e.g., affinity)

      • Simple, but hard to evolve

    • Two-level schedulers (Mesos, Omega)

      • Applications implement end-to-end scheduling

      • Highly evolvable, but complex (application must implement a custom scheduler)

  • ESCHER

    • Two key abstractions

      • Resource matching scheduler

      • Applications create ephemeral resources and get cluster state at runtime through an API

    • Application -> ESCHER Scheduling Library (ESL) -> Framework Scheduler

    • Add latency (because of RPC call), reduce the implementation burden

    • Implemented in Ray and Kubernetes

Last updated