Deep Learning is a subset of machine learning that uses multi-layered artificial neural networks to learn hierarchical representations of data, enabling computers to automatically discover patterns and features without explicit programming. It powers modern AI applications from computer vision and natural language processing to speech recognition and autonomous systems. The field has evolved from basic feedforward networks to sophisticated architectures like transformers, state space models, and diffusion models, with gradient-based optimization and backpropagation remaining the fundamental learning mechanisms. Understanding the trade-offs between model capacity, computational efficiency, and generalization is critical for practitioners building production systems.
What This Cheat Sheet Covers
This topic spans 21 focused tables and 226 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Neural Network Architectures
| Architecture | Example | Description |
|---|---|---|
encoder = TransformerEncoder( d_model=512, nhead=8, num_layers=6) | • Attention-based architecture processing sequences in parallel • dominant architecture in NLP, vision, and multimodal AI through self-attention mechanisms. | |
model = Sequential([ Conv2D(32, 3, activation='relu'), MaxPooling2D(2), Flatten(), Dense(10)]) | • Grid-structured data processor using convolutional layers for spatial feature extraction • dominant in computer vision tasks. | |
model = Sequential([ LSTM(128, return_sequences=True), Dense(output_dim)]) | • RNN variant with gates (input, forget, output) to control information flow • solves vanishing gradient problem for long-term dependencies. | |
x = Conv2D(64, 3)(x)x = Add()([x, shortcut])x = Activation('relu')(x) | • Deep CNN with skip connections that add input directly to output • enables training of 100+ layer networks by mitigating degradation. | |
model = VisionTransformer( image_size=224, patch_size=16, num_classes=1000) | • Applies transformer architecture to image patches treated as tokens • achieves state-of-the-art on vision tasks with sufficient data. | |
generator = Sequential([Dense(256), ...])discriminator = Sequential([Dense(256), ...]) | • Dual-network system with generator creating samples and discriminator distinguishing real from fake • used for synthetic data generation. | |
encoder = Sequential([Dense(latent_dim*2)])decoder = Sequential([Dense(input_dim)]) | • Probabilistic generative model learning latent distributions • enables controlled generation through continuous latent space. | |
encoder = Sequential([Dense(128), Dense(64)])decoder = Sequential([Dense(128), Dense(input_dim)]) | • Unsupervised network learning compressed representations • used for dimensionality reduction, denoising, and feature learning. |