Stable Diffusion is a latent, text-to-image diffusion model that was released in 2022. Latent diffusion models (LDMs) operate by repeatedly reducing noise in a latent representation space and then converting that representation into a complete image.
A model that combines different neural networks, the process of text-to-image generation in Stable Diffusion can be divided into four. Here’s an overview:
- First, an Image Encoder converts training images into vectors in a mathematical space known as the latent space, where image information can be represented as arrays of numbers.
- A Text Encoder translates text into high-dimensional vectors that machine learning models can comprehend.
- A Diffusion Model then utilizes the text guidance to create new images in the latent space.
- Finally, an Image Decoder transforms the image data from the latent space into an actual image constructed with pixels.
The primary function of Stable Diffusion is to generate detailed images based on text descriptions, but it can also be used for other tasks like inpainting, outpainting, and creating image-to-image translations guided by text prompts. Its weights, model card, and code are available publicly.
Stability AI created Stable Diffusion in partnership with several academic researchers and non-profit organizations. It recommends a GPU with at least 6.9GB of video RAM to run Stable Diffusion. This makes it more accessible than previous text-to-image models like DALL-E and Midjourney, which are proprietary and only accessible through cloud services.