> For the complete documentation index, see [llms.txt](https://paper.lingyunyang.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://paper.lingyunyang.com/reading-notes/conference/eurosys-2021/peak-oracle.md).

# Take it to the limit: Peak prediction-driven resource overcommitment in datacenters

## Metadata

Presented in [EuroSys 2021](https://doi.org/10.1145/3447786.3456259).

Authors: Noman Bashir, Nan Deng, Krzysztof Rzadca, David Irwin, Sree Kodak, Rohit Jnagal (*University of Massachusetts Amherst* & *Google*)

Code (simulator): <https://github.com/googleinterns/cluster-resource-forecast>

## Understanding the paper

This paper focuses on **the problem of resource overcommitment**.

### Question

Assuming the **complete knowledge** of each task’s **future resource usage**, what is the **safest** overcommit policy that yields the **highest utilization**?

This work formalizes the overcommitment problem as **the problem of predicting peak usage on each machine**, which is complementary and orthogonal to scheduling problem.

### Peak Oracle

<figure><img src="/files/3FDeWkAQzQaxvtOIV5Lp" alt=""><figcaption><p>Peak Oracle</p></figcaption></figure>

They implement peak oracle in **simulation** (historical data owns complete knowledge) and use it to evaluate practical peak predictors.

The predictors should be **lightweight** and **fast** to compute.

### Predictors in the paper

* borg-default
  * Inspired by Borg
  * peak = fraction of sum of **limits** (e.g., 90%)
* RC-like
  * Inspired by Resource Central
  * peak = sum(x %ile of tasks **usage**)
* N-sigma
  * Based on central limit theorem
  * peak = mean + N times STD (consider **usage**)
* max(predictors)
  * peak = max(peaks across predictors)
  * Eventually, this paper chooses this predictor, which combines RC-like and N-sigma.

### My takeaways

1. Propose a general methodology (peak oracle) for designing and evaluating overcommit policies.
2. Complementary and orthogonal to the cluster scheduling algorithm.
3. Oversubscribe serving tasks with other serving tasks.
4. Demonstrate that the max predictor policy is less risky and more efficient.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://paper.lingyunyang.com/reading-notes/conference/eurosys-2021/peak-oracle.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
