Stable Diffusion is an open-source latent diffusion model family that generates images from text descriptions. The original SD 1.x/2.x series uses a U-Net backbone operating in a VAE-compressed latent space guided by CLIP text encoders; SD 3 and later shifted to a Multimodal Diffusion Transformer (MMDiT) architecture with triple text-encoder conditioning (CLIP-L, OpenCLIP-G, T5-XXL). Competing models like Flux.1 (Black Forest Labs) apply a similar transformer-based flow-matching design. Understanding generation parametersβfrom CFG scale to sampling schedulersβgives precise control over output, while extensions like ControlNet, LoRA, and IP-Adapter enable advanced customization without retraining the full model.
What This Cheat Sheet Covers
This topic spans 20 focused tables and 149 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Core Model Versions
| Model | Example | Description |
|---|---|---|
runwayml/stable-diffusion-v1-5 | β’ 512Γ512 base resolution β’ 983M parameters β’ largest LoRA/extension/embedding ecosystem β’ fastest on low VRAM | |
stabilityai/stable-diffusion-xl-base-1.0 | β’ 1024Γ1024 native resolution β’ 3.5B parameters β’ dual text encoders (CLIP ViT-L + OpenCLIP ViT-bigG) β’ optional refiner model | |
black-forest-labs/FLUX.1-dev | β’ 12B parameter flow-matching transformer (Black Forest Labs) β’ guidance-distilled; excels at text rendering and photorealism β’ 20β50 steps β’ non-commercial license | |
black-forest-labs/FLUX.1-schnell | β’ Step-distilled Flux variant; 1β4 steps for rapid generation β’ Apache 2.0 license β’ slight quality tradeoff vs dev | |
Flux Kontext dev (12B) | β’ In-context image editing model (Black Forest Labs, May 2025) β’ edits existing images via text instructions β’ maintains character/style consistency across edits | |
stabilityai/stable-diffusion-3-5-large | β’ 8B parameters; MMDiT with CLIP-L + OpenCLIP-G + T5-XXL β’ requires 18GB+ VRAM (GGUF ~12GB) β’ best prompt adherence in Stability lineup | |
stabilityai/stable-diffusion-3-5-medium | β’ 2.5B parameters β’ optimized for 8β12GB VRAM β’ MMDiT architecture with improved attention gates |