Skip to main content

Menu

LEVEL 0
0/5 XP
HomeAboutTopicsPricingMy VaultStatsPractice TestsCertifications

Categories

🎓 Certifications
🤖 Artificial Intelligence
☁️ Cloud and Infrastructure
💾 Data and Databases
💼 Professional Skills
🎯 Programming and Development
🔒 Security and Networking
📚 Specialized Topics
CheatGrid
HomeAboutTopicsPricingMy VaultStatsPractice TestsCertifications
LVLEVEL 0
0/5 XP
GitHub
© 2026 CheatGrid™. All rights reserved.
Privacy PolicyTerms of UseAboutContact

Stable Diffusion Cheat Sheet

Stable Diffusion Cheat Sheet

Back to Generative AI
Updated 2026-05-28
Next Topic: Structured Output Generation with LLMs Cheat Sheet

Stable Diffusion is an open-source latent diffusion model family that generates images from text descriptions. The original SD 1.x/2.x series uses a U-Net backbone operating in a VAE-compressed latent space guided by CLIP text encoders; SD 3 and later shifted to a Multimodal Diffusion Transformer (MMDiT) architecture with triple text-encoder conditioning (CLIP-L, OpenCLIP-G, T5-XXL). Competing models like Flux.1 and the newer Flux.2 (Black Forest Labs) apply a transformer-based flow-matching design, with Flux.2 coupling a 32B rectified flow transformer to a vision-language model for state-of-the-art multi-reference editing. Understanding generation parameters—from CFG scale to sampling schedulers—gives precise control over output, while extensions like ControlNet, LoRA, and IP-Adapter enable advanced customization without retraining the full model.

What This Cheat Sheet Covers

This topic spans 21 focused tables and 170 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Core Model VersionsTable 2: Generation ModesTable 3: Key Generation ParametersTable 4: Sampling Methods (Schedulers)Table 5: Prompt Engineering TechniquesTable 6: ControlNet ModelsTable 7: Fine-Tuning TechniquesTable 8: Advanced ExtensionsTable 9: VAE (Variational Autoencoder)Table 10: UI PlatformsTable 11: Hardware RequirementsTable 12: Model File FormatsTable 13: Upscaling MethodsTable 14: Common Artifacts & FixesTable 15: Aspect Ratios & ResolutionsTable 16: Popular Community ModelsTable 17: Prompt Modifiers by CategoryTable 18: Negative Prompt EssentialsTable 19: CLIP Text EncoderTable 20: Latent Diffusion ProcessTable 21: Performance Optimization

Table 1: Core Model Versions

The diffusion model ecosystem evolved rapidly from SD 1.5's modest 983M U-Net to transformer giants like FLUX.2 [dev] at 32B parameters; knowing each model's architecture, VRAM requirements, and license determines which is practical for your hardware and use case. Models are ordered from most widely deployed to most specialized.

ModelExampleDescription
SDXL 1.0
stabilityai/stable-diffusion-xl-base-1.0
• 1024×1024 native resolution
• 3.5B parameters
• dual text encoders (CLIP ViT-L + OpenCLIP ViT-bigG)
• optional refiner model; largest extension ecosystem after SD 1.5
SD 1.5
runwayml/stable-diffusion-v1-5
• 512×512 base resolution
• 983M parameters
• largest LoRA/embedding/extension ecosystem overall
• fastest on low VRAM
Flux.1 [dev]
black-forest-labs/FLUX.1-dev
• 12B parameter flow-matching transformer (Black Forest Labs)
• guidance-distilled; excels at text rendering and photorealism
• 20–50 steps; non-commercial license
Flux.1 [schnell]
black-forest-labs/FLUX.1-schnell
• Step-distilled Flux variant; 1–4 steps for rapid generation
• Apache 2.0 license
• slight quality tradeoff vs dev
FLUX.2 [dev]
black-forest-labs/FLUX.2-dev
• 32B parameter rectified flow transformer (Black Forest Labs, Nov 2025)
• multi-reference support (up to 10 images), image generation + editing in one model
• up to 4MP output; couples with Mistral-3 24B VLM for world knowledge
• FP8 quantization runs on RTX 4090 via weight streaming; non-commercial license
Flux.1 Kontext [dev]
Flux Kontext dev (12B)
• In-context image editing model (Black Forest Labs, May 2025)
• edits existing images via text instructions
• maintains character/style consistency across edits
HiDream-I1
HiDream-ai/HiDream-I1-Full
• 17B parameter sparse Diffusion Transformer with dynamic MoE architecture (HiDream.ai, April 2025)
• MIT license; SOTA on DPG-Bench and GenEval
• four text encoders: OpenCLIP ViT-bigG + CLIP ViT-L + T5-XXL + Llama-3.1-8B
• variants: Full (50 steps), Dev (28 steps), Fast (16 steps)
SD 3.5 Large
stabilityai/stable-diffusion-3-5-large
• 8B parameters; MMDiT with CLIP-L + OpenCLIP-G + T5-XXL
• requires 18GB+ VRAM (GGUF ~12GB)
• best prompt adherence in Stability lineup

More in Generative AI

  • Speech-to-Text (ASR) Models Cheat Sheet
  • Structured Output Generation with LLMs Cheat Sheet
  • Advanced RAG Patterns and Optimization Cheat Sheet
  • ColBERT and Late Interaction Retrieval Cheat Sheet
  • LangSmith Cheat Sheet
  • NL-to-SQL and Text-to-Code Generation Cheat Sheet
View all 95 topics in Generative AI