CVPR 2024

Meta Info

Homepage: https://cvpr.thecvf.com/Conferences/2024

Paper list: https://cvpr.thecvf.com/Conferences/2024/AcceptedPapers

Papers

Diffusion Models

Acceleration

Cache Me if You Can: Accelerating Diffusion Models through Block Caching [Paper] [Homepage]
- Meta & TUM & MCML & Oxford
- Block caching
  - Reuse outputs from layer blocks of previous steps to speed up inference.
  - Automatically determines caching schedules based on each block's changes over timesteps.
CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model [Paper] [Code]
- TJU & Tencent
- CAT-DM: Controllable Accelerated virtual Try-on with Diffusion Model
- Initiate a reverse denoising process with an implicit distribution generated by a pre-trained GAN-based model → Reduce the sampling steps
DeepCache: Accelerating Diffusion Models for Free [Paper] [Code]
- NUS
- Utilize the inherent temporal redundancy observed in the sequential denoising steps of diffusion models.
- Cache and retrieve features across adjacent denoising stages, thereby reducing redundant computations.
DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models [Paper] [Homepage] [Code]
- MIT & Princeton & Lepton AI & NVIDIA
- Displaced patch parallelism
  - Split the model input into multiple patches and assign each patch to a GPU.
  - Reuse the pre-computed feature maps from the previous timestep to provide context for the current step.
- DistriFusion → Enable running diffusion models across multiple GPUs in parallel
SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation [Paper]
- VinAI Research, Vietnam
- Knowledge distillation: Distill a pre-trained multi-step text-to-image model to a student network that can generate images with just a single inference step.

Support compatibility of add-on modules (ControlNets and LoRAs)

X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model [Paper] [Homepage] [Code]
- NUS & Tencent & FDU
- Enable the pre-trained add-on modules (ControlNet, LoRA) with the upgraded diffusion model (SDXL) without further retraining.

Support arbitrary image size

ElasticDiffusion: Training-free Arbitrary Size Image Generation through Global-Local Content Separation [Paper] [Homepage] [Code]
- Rice University
- Enable pre-trained text-to-image diffusion models to generate images with various sizes.
- Decouple the generation trajectory of a pre-trained model into local and global signals.
  - The local signal controls low-level pixel information and can be estimated on local patches.
  - The global signal is used to maintain overall structural consistency and is estimated with a reference image.

Improve image quality

FreeU: Free Lunch in Diffusion U-Net [Paper] [Homepage] [Code]
- NTU
- Key insight
  - Use two modulation factors to re-weight the feature contributions from the U-Net’s skip connections and backbone.
  - Increasing the backbone scaling factor b significantly enhances image quality.
  - Directly scaling s in the skip features has a limited influence on image synthesis quality.
- FreeU
  - Improve the generation quality with only a few lines of code.
  - Only need to adjust two scaling factors during the inference.

Scalability

On the Scalability of Diffusion-based Text-to-Image Generation [Paper]
- AWS AI Labs & Amazon AGI
- An empirical study of the scaling properties of diffusion-based text-to-image models.
- Perform ablations on scaling both denoising backbones and training set, including training scaled U-Net and Transformer variants ranging from 0.4B to 4B parameters on datasets up to 600M images.
- Specifically
  - Model scaling
    The location and amount of cross-attention distinguish the performance.
    To improve text-image alignment, increasing the transformer blocks is more parameter-efficient than increasing channel numbers.
    Identify an efficient UNet variant.
  - Data scaling
    The quality and diversity of the training set matter more than simply dataset size.
  - Provide scaling functions to predict the text-image alignment performance as functions of the scale of model size, compute, and dataset size.

Last updated 1 year ago

Was this helpful?