Deep Learning Cheat Sheet

Updated 2026-04-20

🧠Study flashcards on this topic201 cards · spaced repetition→

Deep Learning is a subset of machine learning that uses multi-layered artificial neural networks to learn hierarchical representations of data, enabling computers to automatically discover patterns and features without explicit programming. It powers modern AI applications from computer vision and natural language processing to speech recognition and autonomous systems. The field has evolved from basic feedforward networks to sophisticated architectures like transformers, state space models, and diffusion models, with gradient-based optimization and backpropagation remaining the fundamental learning mechanisms. Understanding the trade-offs between model capacity, computational efficiency, and generalization is critical for practitioners building production systems.

What This Cheat Sheet Covers

This topic spans 21 focused tables and 226 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Neural Network ArchitecturesTable 2: Layer TypesTable 3: Activation FunctionsTable 4: Optimization AlgorithmsTable 5: Loss FunctionsTable 6: Regularization TechniquesTable 7: Attention MechanismsTable 8: Embedding TechniquesTable 9: Training TechniquesTable 10: Initialization MethodsTable 11: Model Compression TechniquesTable 12: Evaluation MetricsTable 13: Common Pitfalls and SolutionsTable 14: Advanced ArchitecturesTable 15: Specialized Convolution OperationsTable 16: Residual Connections and Skip ConnectionsTable 17: Normalization TechniquesTable 18: Learning Rate SchedulesTable 19: Batch Processing ConceptsTable 20: Distributed Training TechniquesTable 21: Deep Learning Frameworks

Quick IndexSubscribe to unlock

A jump-to index of every table row in this cheat sheet.

Mind MapSubscribe to unlock

An interactive map of every table and concept in this topic.

Table 1: Neural Network Architectures

These are the blueprints—the high-level shapes a network can take, each suited to a different kind of data. Convolutional networks and their descendants (ResNet, VGG, Inception) dominate images, recurrent designs (RNN, LSTM, GRU) handle sequences, and transformers and Vision Transformers have largely taken over both, while GANs, VAEs, and autoencoders specialize in generating and compressing data.

Architecture	Example	Description
Transformer	`encoder = TransformerEncoder(` `d_model=512, nhead=8, num_layers=6)`	• Attention-based architecture processing sequences in parallel • dominant architecture in NLP, vision, and multimodal AI through self-attention mechanisms.
Convolutional Neural Network (CNN)	`model = Sequential([` `Conv2D(32, 3, activation='relu'),` `MaxPooling2D(2),` `Flatten(), Dense(10)])`	• Grid-structured data processor using convolutional layers for spatial feature extraction • dominant in computer vision tasks.
Long Short-Term Memory (LSTM)	`model = Sequential([` `LSTM(128, return_sequences=True),` `Dense(output_dim)])`	• RNN variant with gates (input, forget, output) to control information flow • solves vanishing gradient problem for long-term dependencies.
ResNet (Residual Network)	`x = Conv2D(64, 3)(x)` `x = Add()([x, shortcut])` `x = Activation('relu')(x)`	• Deep CNN with skip connections that add input directly to output • enables training of 100+ layer networks by mitigating degradation.
Vision Transformer (ViT)	`model = VisionTransformer(` `image_size=224, patch_size=16, num_classes=1000)`	• Applies transformer architecture to image patches treated as tokens • achieves state-of-the-art on vision tasks with sufficient data.
Generative Adversarial Network (GAN)	`generator = Sequential([Dense(256), ...])` `discriminator = Sequential([Dense(256), ...])`	• Dual-network system with generator creating samples and discriminator distinguishing real from fake • used for synthetic data generation.
Variational Autoencoder (VAE)	`encoder = Sequential([Dense(latent_dim*2)])` `decoder = Sequential([Dense(input_dim)])`	• Probabilistic generative model learning latent distributions • enables controlled generation through continuous latent space.
Autoencoder	`encoder = Sequential([Dense(128), Dense(64)])` `decoder = Sequential([Dense(128), Dense(input_dim)])`	• Unsupervised network learning compressed representations • used for dimensionality reduction, denoising, and feature learning.

Table 1: Neural Network Architectures

Architecture	Example	Description
Transformer	`encoder = TransformerEncoder(` `d_model=512, nhead=8, num_layers=6)`	• Attention-based architecture processing sequences in parallel • dominant architecture in NLP, vision, and multimodal AI through self-attention mechanisms.
Convolutional Neural Network (CNN)	`model = Sequential([` `Conv2D(32, 3, activation='relu'),` `MaxPooling2D(2),` `Flatten(), Dense(10)])`	• Grid-structured data processor using convolutional layers for spatial feature extraction • dominant in computer vision tasks.
Long Short-Term Memory (LSTM)	`model = Sequential([` `LSTM(128, return_sequences=True),` `Dense(output_dim)])`	• RNN variant with gates (input, forget, output) to control information flow • solves vanishing gradient problem for long-term dependencies.
ResNet (Residual Network)	`x = Conv2D(64, 3)(x)` `x = Add()([x, shortcut])` `x = Activation('relu')(x)`	• Deep CNN with skip connections that add input directly to output • enables training of 100+ layer networks by mitigating degradation.
Vision Transformer (ViT)	`model = VisionTransformer(` `image_size=224, patch_size=16, num_classes=1000)`	• Applies transformer architecture to image patches treated as tokens • achieves state-of-the-art on vision tasks with sufficient data.
Generative Adversarial Network (GAN)	`generator = Sequential([Dense(256), ...])` `discriminator = Sequential([Dense(256), ...])`	• Dual-network system with generator creating samples and discriminator distinguishing real from fake • used for synthetic data generation.
Variational Autoencoder (VAE)	`encoder = Sequential([Dense(latent_dim*2)])` `decoder = Sequential([Dense(input_dim)])`	• Probabilistic generative model learning latent distributions • enables controlled generation through continuous latent space.
Autoencoder	`encoder = Sequential([Dense(128), Dense(64)])` `decoder = Sequential([Dense(128), Dense(input_dim)])`	• Unsupervised network learning compressed representations • used for dimensionality reduction, denoising, and feature learning.