# Training deep nets with sublinear memory cost

## Meta Info

URL: <https://arxiv.org/abs/1604.06174>

Authors: Tianqi Chen (*UW*), Bing Xu (*Dato. Inc*), Chiyuan Zhang (*MIT*), Carlos Guestrin (*UW*).

### Code (memonger - memory monger)

* Original MXNet Implementation: <https://github.com/dmlc/mxnet-memonger>
* OpenAI's TensorFlow Implementation: <https://github.com/cybertronai/gradient-checkpointing>
* PyTorch Implementation: <https://github.com/Lyken17/pytorch-memonger>

## Understanding the paper

### Problem

How to reduce the memory consumption of DNN training (to enable bigger models or larger batch size)?

### Solution

1. Mainly focus on reducing the memory cost to store intermediate results (feature maps) and gradients.
2. Design an algorithm to **trade computation for memory**. O(√n) memory cost with one extra forward computation per mini-batch.
   * Inplace operation: directly store the output values to memory of a input value.
   * Memory sharing: memory used by intermediate results that are no longer needed can be recycled and used in another node.
   * Re-computation: drop the results of low cost operations and re-compute the dropped intermediate results.

### Guidelines for DL frameworks

1. Enable option to drop result of low cost operations.
2. Provide planning algorithms to give efficient memory plan.
3. Enable user to set the mirror attribute (how many times a result can be recomputed) in the computation graph for memory optimization.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://paper.lingyunyang.com/reading-notes/miscellaneous/arxiv/2016/training-deep-nets-with-sublinear-memory-cost.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
