CVPR 2024
Last updated
Was this helpful?
Last updated
Was this helpful?
Homepage:
Paper list:
Cache Me if You Can: Accelerating Diffusion Models through Block Caching [] []
Meta & TUM & MCML & Oxford
Block caching
Reuse outputs from layer blocks of previous steps to speed up inference.
Automatically determines caching schedules based on each block's changes over timesteps.
CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model [] []
TJU & Tencent
CAT-DM: Controllable Accelerated virtual Try-on with Diffusion Model
Initiate a reverse denoising process with an implicit distribution generated by a pre-trained GAN-based model β Reduce the sampling steps
DeepCache: Accelerating Diffusion Models for Free [] []
NUS
Utilize the inherent temporal redundancy observed in the sequential denoising steps of diffusion models.
Cache and retrieve features across adjacent denoising stages, thereby reducing redundant computations.
DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models [] [] []
MIT & Princeton & Lepton AI & NVIDIA
Displaced patch parallelism
Split the model input into multiple patches and assign each patch to a GPU.
Reuse the pre-computed feature maps from the previous timestep to provide context for the current step.
DistriFusion β Enable running diffusion models across multiple GPUs in parallel
SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation []
VinAI Research, Vietnam
Knowledge distillation: Distill a pre-trained multi-step text-to-image model to a student network that can generate images with just a single inference step.
NUS & Tencent & FDU
Enable the pre-trained add-on modules (ControlNet, LoRA) with the upgraded diffusion model (SDXL) without further retraining.
Rice University
Enable pre-trained text-to-image diffusion models to generate images with various sizes.
Decouple the generation trajectory of a pre-trained model into local and global signals.
The local signal controls low-level pixel information and can be estimated on local patches.
The global signal is used to maintain overall structural consistency and is estimated with a reference image.
NTU
Key insight
Use two modulation factors to re-weight the feature contributions from the U-Netβs skip connections and backbone.
Increasing the backbone scaling factor b significantly enhances image quality.
Directly scaling s in the skip features has a limited influence on image synthesis quality.
FreeU
Improve the generation quality with only a few lines of code.
Only need to adjust two scaling factors during the inference.
AWS AI Labs & Amazon AGI
An empirical study of the scaling properties of diffusion-based text-to-image models.
Perform ablations on scaling both denoising backbones and training set, including training scaled U-Net and Transformer variants ranging from 0.4B to 4B parameters on datasets up to 600M images.
Specifically
Model scaling
The location and amount of cross-attention distinguish the performance.
To improve text-image alignment, increasing the transformer blocks is more parameter-efficient than increasing channel numbers.
Identify an efficient UNet variant.
Data scaling
The quality and diversity of the training set matter more than simply dataset size.
Provide scaling functions to predict the text-image alignment performance as functions of the scale of model size, compute, and dataset size.
X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model [] [] []
ElasticDiffusion: Training-free Arbitrary Size Image Generation through Global-Local Content Separation [] [] []
FreeU: Free Lunch in Diffusion U-Net [] [] []
On the Scalability of Diffusion-based Text-to-Image Generation []