Medea: Scheduling of long running applications in shared production clusters

Metadata

Presented in EuroSys 2018.

Authors: Panagiotis Garefalakis, Konstantinos Karanasos, Peter Pietzuch, Arun Suresh, Sriram Rao (Imperial College London & Microsoft)

Medea adopts a two-scheduler design:
- For the placement of long-running applications (LRAs), it uses a dedicated scheduler.
- For the placement of task-based applications, it directly uses a traditional scheduler.
It formulates the placement of LRAs with constraints as an integer linear program (ILP), and solves it as an online optimization problem.
It considers multiple LRA container requests at once to achieve higher-quality placements and global objectives (e.g., minimizing the violation of placement constraints, resource fragmentation, any load imbalance, or the number of machines used).
It also investigates heuristics that trade placement quality for lower scheduling latency:
1. Tag popularity: prioritizes the placement of containers that have more constraints.
2. Node candidates: prioritizes the placement of containers that have smaller nodes allowed to be placed.

Built as an extension to Apache Hadoop YARN.

Evaluated on a 400-node pre-production cluster.

Last updated 2 years ago

Was this helpful?