DSV: Exploiting Dynamic Sparsity to Accelerate Large-Scale Video DiT Training [arXiv]
CUHK & StepFun
Leverage the dynamic attention sparsity.
Adopt a hybrid sparsity-aware context parallelism that re-balances the skewed workload across attention heads and blocks due to sparsity heterogeneity.