CVPR 2024
Meta Info
Homepage: https://cvpr.thecvf.com/Conferences/2024
Paper list: https://cvpr.thecvf.com/Conferences/2024/AcceptedPapers
Papers
Diffusion Models
Acceleration
CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model [Paper] [Code]
TJU & Tencent
CAT-DM: Controllable Accelerated virtual Try-on with Diffusion Model
Initiate a reverse denoising process with an implicit distribution generated by a pre-trained GAN-based model → Reduce the sampling steps
DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models [Paper] [Homepage] [Code]
MIT & Princeton & Lepton AI & NVIDIA
Displaced patch parallelism
Split the model input into multiple patches and assign each patch to a GPU.
Reuse the pre-computed feature maps from the previous timestep to provide context for the current step.
DistriFusion → Enable running diffusion models across multiple GPUs in parallel
SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation [Paper]
VinAI Research, Vietnam
Knowledge distillation: Distill a pre-trained multi-step text-to-image model to a student network that can generate images with just a single inference step.
Support compatibility of add-on modules (ControlNets and LoRAs)
Support arbitrary image size
ElasticDiffusion: Training-free Arbitrary Size Image Generation through Global-Local Content Separation [Paper] [Homepage] [Code]
Rice University
Enable pre-trained text-to-image diffusion models to generate images with various sizes.
Decouple the generation trajectory of a pre-trained model into local and global signals.
The local signal controls low-level pixel information and can be estimated on local patches.
The global signal is used to maintain overall structural consistency and is estimated with a reference image.
Improve image quality
FreeU: Free Lunch in Diffusion U-Net [Paper] [Homepage] [Code]
NTU
Key insight
Use two modulation factors to re-weight the feature contributions from the U-Net’s skip connections and backbone.
Increasing the backbone scaling factor b significantly enhances image quality.
Directly scaling s in the skip features has a limited influence on image synthesis quality.
FreeU
Improve the generation quality with only a few lines of code.
Only need to adjust two scaling factors during the inference.
Scalability
On the Scalability of Diffusion-based Text-to-Image Generation [Paper]
AWS AI Labs & Amazon AGI
An empirical study of the scaling properties of diffusion-based text-to-image models.
Perform ablations on scaling both denoising backbones and training set, including training scaled U-Net and Transformer variants ranging from 0.4B to 4B parameters on datasets up to 600M images.
Specifically
Model scaling
The location and amount of cross-attention distinguish the performance.
To improve text-image alignment, increasing the transformer blocks is more parameter-efficient than increasing channel numbers.
Identify an efficient UNet variant.
Data scaling
The quality and diversity of the training set matter more than simply dataset size.
Provide scaling functions to predict the text-image alignment performance as functions of the scale of model size, compute, and dataset size.
Last updated