张德祥 (2022-09-01 22:03):
#paper https://doi.org/10.48550/arXiv.2208.11970 Understanding Diffusion Models: A Unified Perspective ;最近大火的视频生成模型 dall-e 等背后都是diffusion 模型,这篇论文细致的讲解了diffusion模型的来龙去脉,从ELBO 到VAE 到hierarchical VAE 到diffusion 模型,及diffusion模型的三个视角及diffusion模型的局限,整篇论文公式推导清晰易读是了解diffusion模型的好资料。
Understanding Diffusion Models: A Unified Perspective
翻译
Abstract:
Diffusion models have shown incredible capabilities as generative models; indeed, they power the current state-of-the-art models on text-conditioned image generation such as Imagen and DALL-E 2. In this work we review, demystify, and unify the understanding of diffusion models across both variational and score-based perspectives. We first derive Variational Diffusion Models (VDM) as a special case of a Markovian Hierarchical Variational Autoencoder, where three key assumptions enable tractable computation and scalable optimization of the ELBO. We then prove that optimizing a VDM boils down to learning a neural network to predict one of three potential objectives: the original source input from any arbitrary noisification of it, the original source noise from any arbitrarily noisified input, or the score function of a noisified input at any arbitrary noise level. We then dive deeper into what it means to learn the score function, and connect the variational perspective of a diffusion model explicitly with the Score-based Generative Modeling perspective through Tweedie's Formula. Lastly, we cover how to learn a conditional distribution using diffusion models via guidance.
翻译
回到顶部