# CVPR 2024

## Meta Info

Homepage: <https://cvpr.thecvf.com/Conferences/2024>

Paper list: <https://cvpr.thecvf.com/Conferences/2024/AcceptedPapers>

## Papers

### Diffusion Models

#### Acceleration

* Cache Me if You Can: Accelerating Diffusion Models through Block Caching \[[Paper](https://openaccess.thecvf.com/content/CVPR2024/html/Wimbauer_Cache_Me_if_You_Can_Accelerating_Diffusion_Models_through_Block_CVPR_2024_paper.html)] \[[Homepage](https://fwmb.github.io/blockcaching/)]
  * Meta & TUM & MCML & Oxford
  * **Block caching**
    * Reuse outputs from layer blocks of previous steps to speed up inference.
    * Automatically determines caching schedules based on each block's changes over timesteps.
* CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model \[[Paper](https://openaccess.thecvf.com/content/CVPR2024/html/Zeng_CAT-DM_Controllable_Accelerated_Virtual_Try-on_with_Diffusion_Model_CVPR_2024_paper.html)] \[[Code](https://github.com/zengjianhao/CAT-DM)]
  * TJU & Tencent
  * **CAT-DM**: **C**ontrollable **A**ccelerated virtual **T**ry-on with **D**iffusion **M**odel
  * Initiate a reverse denoising process with an implicit distribution generated by a pre-trained GAN-based model → Reduce the sampling steps
* DeepCache: Accelerating Diffusion Models for Free \[[Paper](https://openaccess.thecvf.com/content/CVPR2024/html/Ma_DeepCache_Accelerating_Diffusion_Models_for_Free_CVPR_2024_paper.html)] \[[Code](https://github.com/horseee/DeepCache)]
  * NUS
  * Utilize the inherent temporal redundancy observed in the sequential denoising steps of diffusion models.
  * Cache and retrieve features across adjacent denoising stages, thereby reducing redundant computations.
* DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models \[[Paper](https://openaccess.thecvf.com/content/CVPR2024/html/Li_DistriFusion_Distributed_Parallel_Inference_for_High-Resolution_Diffusion_Models_CVPR_2024_paper.html)] \[[Homepage](https://hanlab.mit.edu/projects/distrifusion)] \[[Code](https://github.com/mit-han-lab/distrifuser)]
  * MIT & Princeton & Lepton AI & NVIDIA
  * Displaced patch parallelism
    * Split the model input into multiple patches and assign each patch to a GPU.
    * Reuse the pre-computed feature maps from the previous timestep to provide context for the current step.
  * **DistriFusion** → Enable running diffusion models across multiple GPUs in parallel
* SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation \[[Paper](https://openaccess.thecvf.com/content/CVPR2024/html/Nguyen_SwiftBrush_One-Step_Text-to-Image_Diffusion_Model_with_Variational_Score_Distillation_CVPR_2024_paper.html)]
  * VinAI Research, Vietnam
  * **Knowledge distillation:** Distill a pre-trained multi-step text-to-image model to a student network that can generate images with *just a single inference step*.

#### Support compatibility of add-on modules (ControlNets and LoRAs)

* X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model \[[Paper](https://openaccess.thecvf.com/content/CVPR2024/html/Ran_X-Adapter_Adding_Universal_Compatibility_of_Plugins_for_Upgraded_Diffusion_Model_CVPR_2024_paper.html)] \[[Homepage](https://showlab.github.io/X-Adapter/)] \[[Code](https://github.com/showlab/X-Adapter)]
  * NUS & Tencent & FDU
  * Enable the pre-trained add-on modules (ControlNet, LoRA) with the upgraded diffusion model (SDXL) *without further retraining*.

#### Support arbitrary image size

* ElasticDiffusion: Training-free Arbitrary Size Image Generation through Global-Local Content Separation \[[Paper](https://openaccess.thecvf.com/content/CVPR2024/html/Haji-Ali_ElasticDiffusion_Training-free_Arbitrary_Size_Image_Generation_through_Global-Local_Content_Separation_CVPR_2024_paper.html)] \[[Homepage](https://elasticdiffusion.github.io)] \[[Code](https://github.com/MoayedHajiAli/ElasticDiffusion-official)]
  * Rice University
  * Enable pre-trained text-to-image diffusion models to generate images with various sizes.
  * Decouple the generation trajectory of a pre-trained model into local and global signals.
    * The local signal controls low-level pixel information and can be estimated on local patches.
    * The global signal is used to maintain overall structural consistency and is estimated with a reference image.

#### Improve image quality

* FreeU: Free Lunch in Diffusion U-Net \[[Paper](https://openaccess.thecvf.com/content/CVPR2024/html/Si_FreeU_Free_Lunch_in_Diffusion_U-Net_CVPR_2024_paper.html)] \[[Homepage](https://chenyangsi.top/FreeU/)] \[[Code](https://github.com/ChenyangSi/FreeU)]
  * NTU
  * Key insight
    * Use two modulation factors to re-weight the feature contributions from the U-Net’s skip connections and backbone.
    * Increasing the backbone scaling factor *b* significantly enhances image quality.
    * Directly scaling *s* in the skip features has a limited influence on image synthesis quality.
  * **FreeU**
    * Improve the generation quality with only a few lines of code.
    * Only need to adjust two scaling factors during the inference.

#### Scalability

* On the Scalability of Diffusion-based Text-to-Image Generation \[[Paper](https://openaccess.thecvf.com/content/CVPR2024/html/Li_On_the_Scalability_of_Diffusion-based_Text-to-Image_Generation_CVPR_2024_paper.html)]
  * AWS AI Labs & Amazon AGI
  * An empirical study of the scaling properties of diffusion-based text-to-image models.
  * Perform ablations on scaling both denoising backbones and training set, including training scaled U-Net and Transformer variants ranging from 0.4B to 4B parameters on datasets up to 600M images.
  * Specifically
    * Model scaling
      * The location and amount of cross-attention distinguish the performance.
      * To improve text-image alignment, increasing the transformer blocks is more parameter-efficient than increasing channel numbers.
      * Identify an efficient UNet variant.
    * Data scaling
      * The quality and diversity of the training set matter more than simply dataset size.
    * Provide scaling functions to predict the text-image alignment performance as functions of the scale of model size, compute, and dataset size.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://paper.lingyunyang.com/reading-notes/conference/cvpr-2024.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
