Neural Networks Core Cheat Sheet

Updated 2026-04-27

Next Topic: NVIDIA TensorRT for Inference Optimization Cheat Sheet

🧠Study flashcards on this topic112 cards · spaced repetition→

Neural networks are computational models inspired by biological neurons, consisting of interconnected layers of nodes (neurons) that learn patterns through backpropagation and gradient descent. They form the foundation of modern deep learning, enabling breakthroughs in computer vision, natural language processing, sequential data modeling, and generative AI. Key to success: network depth enables feature hierarchy, proper initialization and normalization prevent gradient issues, and the right architecture family — feedforward, recurrent, or attention-based — must match the data structure. Modern training practice combines adaptive optimizers like AdamW, mixed-precision arithmetic, and learning-rate schedules to train efficiently at scale.

What This Cheat Sheet Covers

This topic spans 14 focused tables and 125 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Network ArchitecturesTable 2: Layer TypesTable 3: Activation FunctionsTable 4: Loss FunctionsTable 5: Optimization AlgorithmsTable 6: Regularization TechniquesTable 7: Training MechanicsTable 8: Weight InitializationTable 9: Common Problems and SolutionsTable 10: HyperparametersTable 11: Evaluation MetricsTable 12: Pooling OperationsTable 13: Advanced TechniquesTable 14: Model Compression

Quick IndexSubscribe to unlock

A jump-to index of every table row in this cheat sheet.

Mind MapSubscribe to unlock

An interactive map of every table and concept in this topic.

Table 1: Network Architectures

The architecture you reach for is the single biggest decision in any deep learning project, because it has to match the shape of your data — feedforward nets for tabular inputs, CNNs for grids of pixels, RNNs and their gated variants for sequences, and transformers when long-range relationships matter most. Knowing what each family was built to solve, and the problem that motivated the next one (skip connections rescuing very deep nets, attention replacing recurrence), makes the rest of the cheat sheet click into place.

Type	Example	Description
Feedforward Neural Network (FNN)	`input → hidden1 → hidden2 → output`	• Information flows in one direction only from input to output • no cycles • simplest architecture for supervised learning.
Multilayer Perceptron (MLP)	`Dense(128, relu) → Dense(64, relu) → Dense(10, softmax)`	• Feedforward network with one or more fully connected hidden layers • standard for tabular data and classification.
Convolutional Neural Network (CNN)	`Conv2D → ReLU → MaxPool → Flatten → Dense`	• Specialized for spatial data (images) • uses convolutional filters to detect local patterns • dominant architecture for computer vision.
Recurrent Neural Network (RNN)	`h_t = tanh(W_h * h_{t-1} + W_x * x_t)`	• Maintains hidden state across time steps • processes sequential data (text, time series) • suffers from vanishing gradients on long sequences.
LSTM (Long Short-Term Memory)	`LSTM(units=128, return_sequences=True)`	• RNN variant with forget, input, and output gates controlling cell state • solves vanishing gradient for long-range dependencies.
GRU (Gated Recurrent Unit)	`GRU(units=128)`	• Simplified LSTM with reset and update gates but no separate cell state • fewer parameters than LSTM • competitive performance on many tasks.

Table 1: Network Architectures

Type	Example	Description
Feedforward Neural Network (FNN)	`input → hidden1 → hidden2 → output`	• Information flows in one direction only from input to output • no cycles • simplest architecture for supervised learning.
Multilayer Perceptron (MLP)	`Dense(128, relu) → Dense(64, relu) → Dense(10, softmax)`	• Feedforward network with one or more fully connected hidden layers • standard for tabular data and classification.
Convolutional Neural Network (CNN)	`Conv2D → ReLU → MaxPool → Flatten → Dense`	• Specialized for spatial data (images) • uses convolutional filters to detect local patterns • dominant architecture for computer vision.
Recurrent Neural Network (RNN)	`h_t = tanh(W_h * h_{t-1} + W_x * x_t)`	• Maintains hidden state across time steps • processes sequential data (text, time series) • suffers from vanishing gradients on long sequences.
LSTM (Long Short-Term Memory)	`LSTM(units=128, return_sequences=True)`	• RNN variant with forget, input, and output gates controlling cell state • solves vanishing gradient for long-range dependencies.
GRU (Gated Recurrent Unit)	`GRU(units=128)`	• Simplified LSTM with reset and update gates but no separate cell state • fewer parameters than LSTM • competitive performance on many tasks.