The Stable Diffusion algorithm is strange, and I’m surprised someone thought of it, and surprised it works.
IIRC it works like this:
Stable Diffusion starts with an image of completely random noise. The idea is that the text prompt given to the model describes a hypothetical image where the noise was added. So, the model tries to “predict,” given the text, what the image would look like if it was denoised a little bit. It does this repeatedly until the image is fully denoised.
So, it’s very easy for the algorithm to make a “mistake” in one iteration by coloring the wrong pixels black. It’s unable to correct it’s mistake in later denoising iterations, and just fills in the pixels around it with what it thinks looks plausible. And, it can’t really “plan” ahead of time, it can only do one denoising operation at a time.
The Stable Diffusion algorithm is strange, and I’m surprised someone thought of it, and surprised it works.
IIRC it works like this: Stable Diffusion starts with an image of completely random noise. The idea is that the text prompt given to the model describes a hypothetical image where the noise was added. So, the model tries to “predict,” given the text, what the image would look like if it was denoised a little bit. It does this repeatedly until the image is fully denoised.
So, it’s very easy for the algorithm to make a “mistake” in one iteration by coloring the wrong pixels black. It’s unable to correct it’s mistake in later denoising iterations, and just fills in the pixels around it with what it thinks looks plausible. And, it can’t really “plan” ahead of time, it can only do one denoising operation at a time.