GPT (Generative Pre-trained Transformer) models are large language models developed by OpenAI that use transformer architecture to generate text, analyze images, process audio, and perform complex reasoning. The lineup has expanded dramatically: the current flagship, GPT-5.5 (released April 2026), delivers 1M-token context and native computer-use capabilities, while the GPT-4.1 family offers a 1M-token non-reasoning alternative, and open-weight models (gpt-oss) are now available under Apache 2.0. OpenAI's new Responses API has replaced Chat Completions as the recommended primitive for agent-style workloads, bringing stateful context, built-in tools, and MCP server support into a single call.
What This Cheat Sheet Covers
This topic spans 14 focused tables and 140 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Core Architecture
Underneath every GPT model is the same transformer machinery, and these are its load-bearing pieces — self-attention that lets each token weigh every other, autoregressive decoding that writes one token at a time, BPE tokenization, and the training stages that turn raw web text into an aligned assistant. Understanding how they fit together explains both why these models are so capable and where their quirks come from.
| Component | Example | Description |
|---|---|---|
Multi-head self-attention + feed-forward layers | Foundation of all GPT models — processes token sequences in parallel, enabling efficient long-range dependency modeling. | |
Query-Key-Value matrices compute token relationships | • Each token attends to all others, assigning importance weights based on semantic relevance • enables context-aware representations. | |
Predicts token t_{n+1} given t_1, t_2, ..., t_n | Generates text one token at a time, conditioning each prediction on all previously generated tokens for coherent outputs. | |
8–96 attention heads process different aspects simultaneously | Parallel attention mechanisms allow the model to focus on multiple representation subspaces (syntax, semantics, entities) at once. | |
Unsupervised learning on web text → RLHF alignment | Two-stage process: model learns language patterns from massive datasets, then human feedback refines outputs for helpfulness and safety. |