DiffusionModel

生成式AIDiffusion Model 原理剖析

WIP

高斯采样, denoise n次, 产生图片

image.png

同一个denosie反复使用, 但是输入噪声的强度不同, 设计一个强度输入

image.png

model 内部是预测噪声, 之后减去噪声

image.png

为什么不直接输出呢? 产生噪声更简单, 分布简单好训练

但是噪声预测怎么训练? 没有pairdata啊

那就自己加噪音 - Forward Process

image.png

image.png

数据集 LAION IMAGENET, 中有文字和图片的对应

Text -to-image

image.png

stable diffusion

三大组件

TextEncoder -> vector 噪音 -> Generation Model -> 中间产物(图片的压缩版本) 压缩图片 -> decoder -> 图片

image.png

分开训练后组合

Text encoder

text-encoder对图像结果影响很大 FID 越小越好, CLIP 越大越好, a为text-encoder b为predict-model大小

image.png

FID: CLIP: 分别对文子和图片求向量, 越近越好

image.png

Decoder

训练资料 image.png

但是输入的是 Latent Representation? 那怎么输入?

用VAE/AE 就行

image.png

What are Diffusion Models?

WIP

Diffusion Models? is a type of Generative model. Like GAN(depreted), VAE.

A generative model is a machine learning model designed to create new data that is similar to its training data. What is a generative model?

diffusion model use a special method to learn features and generate data - adding noise and learn from noise.

diffusion steps is to slowly add random noise and then learn to reverse the diffusion process to construct desired data samples from noise What are Diffusion Models?

If a model total unrecognize, how to learn from it?

Forward Diffusion process

Assuming we have set of images of cat(real data distribution), Given a data point sampled from a real data distribution randomly

In probability theory and statistics, a Markov chain or Markov process is a stochastic process describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Markov chain proces

Snipaste_2025-04-09_11-24-08.png

Then define a Markov chain process called Forward Diffusion Process which can add small amount of Gaussian noise to the image we sampled. Repate T times producing a sequence of noisy samples X1-Xt form real data set

Snipaste_2025-04-09_10-52-44.png

In real world signal, Gaussian noise models (some, but not all) actual physical scenarios quite well. In various ways sample data have noise more or less, This is a side effect of the Central Limit Theorem

In deep learning, Gaussian noise is often added to the input data during training to improve the robustness and generalization ability of the model. This is known as data augmentation

By adding noise to the input data, the model is forced to learn features that are robust to small variations in the input, which can help it perform better on new, unseen data.

Model can learn from sample noise and ignore them, if model only seen ‘clean data’ model can be interrupted by noise samples. Why Gaussian noise apart form other noise? Gaussian noise is useful

The step size is controlled by variance schedule , took T steps from 0 to 1

Snipaste_2025-04-09_11-26-00.png

we can afford a larger update step when the sample gets noisier, so β 1 < β 2 < ⋯ < β T and therefore ᾱ 1 > ⋯ > ᾱ T .

So every step we get


Last modified April 9, 2025: 4.9 (d1e7ba5)