Graph Neural Networks represent a fundamental advancement in deep learning, enabling neural networks to directly process graph-structured data where entities (nodes) and their relationships (edges) carry semantic meaning. Unlike traditional neural networks that operate on grid-like structures (images, sequences), GNNs exploit the topology and connectivity patterns inherent in graphs through message passing — iteratively aggregating information from neighboring nodes to update node representations. GNNs have become the standard architecture for learning from networks in molecular chemistry, social systems, knowledge graphs, and any domain where relationships between entities are as important as the entities themselves. A critical insight: most GNN architectures are permutation equivariant (node order doesn't matter) yet often limited by the Weisfeiler-Leman test in their ability to distinguish certain graph structures.
What This Cheat Sheet Covers
This topic spans 19 focused tables and 146 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Core GNN Layer Architectures
These are the workhorse layers you reach for first, and they differ mainly in how a node decides which neighbors to listen to. GCN weights them by degree, GAT learns attention over each edge, GraphSAGE samples a fixed neighborhood so it scales and generalizes to unseen nodes, and GIN uses injective sum aggregation to squeeze out the most expressive power within the 1-WL limit.
| Architecture | Example | Description |
|---|---|---|
GCNConv(in_channels, out_channels)h_i' = σ(∑_{j∈N(i)} (1/√(d_i·d_j)) W h_j) | • Spectral-inspired layer using symmetric normalization of the adjacency matrix • aggregates neighbor features weighted by node degree • foundational but prone to oversmoothing | |
GATConv(in_channels, out_channels, heads=8)α_ij = softmax(LeakyReLU(a^T[Wh_i || Wh_j])) | • Learns attention weights \alpha_{ij} for each edge to determine neighbor importance• supports multi-head attention for richer representations • handles heterogeneous neighborhoods better than GCN | |
SAGEConv(in_channels, out_channels, aggr='mean')h_i' = σ(W·CONCAT(h_i, AGG({h_j : j∈N(i)}))) | • Inductive learning via neighborhood sampling (fixed-size subsets) • aggregates via mean/max/LSTM • scales to unseen nodes • critical for large graphs | |
GINConv(MLP)h_i' = MLP((1+ε)·h_i + ∑_{j∈N(i)} h_j) | • Provably maximally expressive within 1-WL test hierarchy • uses injective sum aggregation with learnable \epsilon• distinguishes more graph structures than GCN/GAT |