2

Sparsely Supervised Diffusion

Wenshuai Zhao
Zhiyuan Li
Yi Zhao
Mohammad Hassan Vali
Martin Trapp
Joni Pajarinen
Juho Kannala
Arno Solin
Main:8 Pages
11 Figures
Bibliography:3 Pages
7 Tables
Appendix:9 Pages
Abstract

Diffusion models have shown remarkable success across a wide range of generative tasks. However, they often suffer from spatially inconsistent generation, arguably due to the inherent locality of their denoising mechanisms. This can yield samples that are locally plausible but globally inconsistent. To mitigate this issue, we propose sparsely supervised learning for diffusion models, a simple yet effective masking strategy that can be implemented with only a few lines of code. Interestingly, the experiments show that it is safe to mask up to 98\% of pixels during diffusion model training. Our method delivers competitive FID scores across experiments and, most importantly, avoids training instability on small datasets. Moreover, the masking strategy reduces memorization and promotes the use of essential contextual information during generation.

View on arXiv
Comments on this paper