Variational Autoencoders (VAEs) Cheat Sheet

Updated 2026-03-17

Next Topic: Vector Embeddings Cheat Sheet

Variational Autoencoders (VAEs) are probabilistic generative models that learn to encode data into a continuous latent space and reconstruct it through a decoder, introduced by Kingma and Welling in 2013. Unlike traditional autoencoders, VAEs impose a structured probabilistic distribution (typically Gaussian) on the latent space, enabling them to generate new, realistic samples by sampling from this learned distribution. The key insight is the reparameterization trick, which makes the stochastic sampling process differentiable, allowing end-to-end training via backpropagation. VAEs optimize the Evidence Lower Bound (ELBO), balancing reconstruction quality with regularization to prevent overfitting — a tension that makes them both powerful and nuanced to train effectively.

What This Cheat Sheet Covers

This topic spans 12 focused tables and 78 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Core Architecture ComponentsTable 2: Loss Function and OptimizationTable 3: Reparameterization and SamplingTable 4: Training Techniques and OptimizationTable 5: Common Problems and SolutionsTable 6: VAE Variants and ExtensionsTable 7: Disentangled RepresentationsTable 8: Encoder and Decoder ArchitecturesTable 9: ApplicationsTable 10: Sequential and Time-Series VAEsTable 11: Advanced TopicsTable 12: Practical Implementation

Table 1: Core Architecture Components

Component	Example	Description
Encoder	`z_mean, z_log_var = encoder(x)` `# Maps input to latent params`	• Neural network that maps input $x$ to parameters of a probability distribution (mean $\mu$ and log-variance $\log \sigma^2$ ) in the latent space • typically uses CNN layers for images or fully connected layers for tabular data
Decoder	`x_reconstructed = decoder(z)` `# Maps latent code to output`	• Neural network that reconstructs input from latent code $z$ • mirrors encoder architecture in reverse, often using transposed convolutions for upsampling in image tasks
Latent space	`z ~ N(mu, sigma^2)` `# Gaussian distribution`	• Low-dimensional continuous representation where each dimension ideally captures a meaningful factor of variation • enables smooth interpolation and generation of new samples
Prior distribution	`p(z) = N(0, I)` `# Standard Gaussian prior`	• Assumed distribution over latent variables before observing data • typically standard normal $\mathcal{N}(0, I)$ to simplify KL divergence computation and enable random sampling

Table 1: Core Architecture Components

Component	Example	Description
Encoder	`z_mean, z_log_var = encoder(x)` `# Maps input to latent params`	• Neural network that maps input $x$ to parameters of a probability distribution (mean $\mu$ and log-variance $\log \sigma^2$ ) in the latent space • typically uses CNN layers for images or fully connected layers for tabular data
Decoder	`x_reconstructed = decoder(z)` `# Maps latent code to output`	• Neural network that reconstructs input from latent code $z$ • mirrors encoder architecture in reverse, often using transposed convolutions for upsampling in image tasks
Latent space	`z ~ N(mu, sigma^2)` `# Gaussian distribution`	• Low-dimensional continuous representation where each dimension ideally captures a meaningful factor of variation • enables smooth interpolation and generation of new samples
Prior distribution	`p(z) = N(0, I)` `# Standard Gaussian prior`	• Assumed distribution over latent variables before observing data • typically standard normal $\mathcal{N}(0, I)$ to simplify KL divergence computation and enable random sampling