Skip to main content

Menu

LEVEL 0
0/5 XP
HomeAboutTopicsPricingMy VaultStats

Categories

πŸ€– Artificial Intelligence
☁️ Cloud and Infrastructure
πŸ’Ύ Data and Databases
πŸ’Ό Professional Skills
🎯 Programming and Development
πŸ”’ Security and Networking
πŸ“š Specialized Topics
HomeAboutTopicsPricingMy VaultStats
LEVEL 0
0/5 XP
GitHub
Β© 2026 CheatGridβ„’. All rights reserved.
Privacy PolicyTerms of UseAboutContact

Neural Networks Architecture Cheat Sheet

Neural Networks Architecture Cheat Sheet

Back to AI and Machine Learning
Updated 2026-04-28
Next Topic: Neural Networks Core Cheat Sheet

Neural Networks Architecture encompasses the structural design of artificial neural systemsβ€”from foundational feedforward networks to specialized architectures like CNNs (convolutional for images), RNNs/LSTMs (recurrent for sequences), Transformers (attention-based for parallelizable sequence processing), Diffusion Models (iterative denoising for generation), and State Space Models like Mamba (linear-time sequence modeling). Modern architectures emerged from addressing key challenges: CNNs solve spatial pattern recognition via convolution, Transformers replace recurrence with self-attention for state-of-the-art NLP and vision, and Diffusion Transformers (DiTs) are displacing U-Nets as the backbone of generative models. A critical insight: architecture choice defines what a network can learnβ€”residual connections in ResNet enable training 152+ layer networks, attention captures long-range dependencies, and selective state updates in Mamba achieve Transformer-quality modeling with linear rather than quadratic compute scaling.


What This Cheat Sheet Covers

This topic spans 25 focused tables and 195 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Foundational Network TypesTable 2: CNN Core ComponentsTable 3: Popular CNN ArchitecturesTable 4: RNN and Sequential ArchitecturesTable 5: Transformer ComponentsTable 6: Transformer and LLM Architecture VariantsTable 7: Efficient Attention MechanismsTable 8: Activation FunctionsTable 9: Regularization TechniquesTable 10: Loss FunctionsTable 11: Optimization AlgorithmsTable 12: Learning Rate SchedulingTable 13: Weight InitializationTable 14: Advanced Architecture PatternsTable 15: Training Challenges and SolutionsTable 16: GAN Architectures and TechniquesTable 17: Autoencoder VariantsTable 18: Diffusion Model ArchitecturesTable 19: Graph Neural Network ComponentsTable 20: Specialized ArchitecturesTable 21: Object Detection ArchitecturesTable 22: Vision Transformer ComponentsTable 23: Normalization TechniquesTable 24: Transfer Learning StrategiesTable 25: State Space Models and Hybrid Architectures

Table 1: Foundational Network Types

TypeExampleDescription
Feedforward Neural Network (FNN)
input β†’ hidden β†’ output (no cycles)
β€’ Unidirectional information flow β€” neurons in one layer connect only to the next
β€’ used for tabular data and classification.
Multilayer Perceptron (MLP)
input β†’ fc1(256) β†’ fc2(128) β†’ output
β€’ FNN with multiple hidden layers and nonlinear activations
β€’ universal function approximator and backbone of fully connected layers.
Convolutional Neural Network (CNN)
conv β†’ pool β†’ conv β†’ pool β†’ flatten β†’ fc
β€’ Specialized for spatial data (images)
β€’ learns hierarchical features via convolution filters exploiting spatial locality and parameter sharing.
Transformer
multi-head attention + feedforward (no recurrence)
β€’ Attention-based architecture processing sequences in parallel
β€’ replaced RNNs in NLP and vision
β€’ enables GPT, BERT, ViT through self-attention.
Recurrent Neural Network (RNN)
h_t = \tanh(W_h h_{t-1} + W_x x_t)
β€’ Designed for sequential data with temporal dependencies
β€’ maintains hidden state across time steps
β€’ suffers from vanishing gradients.
Long Short-Term Memory (LSTM)
forget gate β†’ input gate β†’ output gate
β€’ RNN variant with gating mechanisms to preserve long-term dependencies
β€’ cell state acts as memory highway
β€’ state-of-the-art for sequences before Transformers.
Gated Recurrent Unit (GRU)
update gate + reset gate (simpler than LSTM)
β€’ Simplified LSTM with fewer parameters
β€’ combines forget/input gates into update gate
β€’ often matches LSTM performance with faster training.

More in AI and Machine Learning

  • Neural Network Attention Mechanisms Cheat Sheet
  • Neural Networks Core Cheat Sheet
  • AI Bias & Fairness Cheat Sheet
  • Edge AI and TinyML Cheat Sheet
  • Mixture of Experts (MoE) Architecture Cheat Sheet
  • PyTorch Cheat Sheet
View all 83 topics in AI and Machine Learning