Neural networks are computational models inspired by biological neurons, consisting of interconnected layers of nodes (neurons) that learn patterns through backpropagation and gradient descent. They form the foundation of modern deep learning, enabling breakthroughs in computer vision, natural language processing, sequential data modeling, and generative AI. Key to success: network depth enables feature hierarchy, proper initialization and normalization prevent gradient issues, and the right architecture family — feedforward, recurrent, or attention-based — must match the data structure. Modern training practice combines adaptive optimizers like AdamW, mixed-precision arithmetic, and learning-rate schedules to train efficiently at scale.
What This Cheat Sheet Covers
This topic spans 14 focused tables and 125 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Network Architectures
| Type | Example | Description |
|---|---|---|
input → hidden1 → hidden2 → output | • Information flows in one direction only from input to output • no cycles • simplest architecture for supervised learning. | |
Dense(128, relu) → Dense(64, relu) → Dense(10, softmax) | • Feedforward network with one or more fully connected hidden layers • standard for tabular data and classification. | |
Conv2D → ReLU → MaxPool → Flatten → Dense | • Specialized for spatial data (images) • uses convolutional filters to detect local patterns • dominant architecture for computer vision. | |
h_t = tanh(W_h * h_{t-1} + W_x * x_t) | • Maintains hidden state across time steps • processes sequential data (text, time series) • suffers from vanishing gradients on long sequences. | |
LSTM(units=128, return_sequences=True) | • RNN variant with forget, input, and output gates controlling cell state • solves vanishing gradient for long-range dependencies. | |
GRU(units=128) | • Simplified LSTM with reset and update gates but no separate cell state • fewer parameters than LSTM • competitive performance on many tasks. |