GPT (Generative Pre-trained Transformer) models are large language models developed by OpenAI that use transformer architecture to generate text, analyze images, process audio, and perform complex reasoning. The lineup has expanded dramatically: the current flagship, GPT-5.5 (released April 2026), delivers 1M-token context and native computer-use capabilities, while the GPT-4.1 family offers a 1M-token non-reasoning alternative, and open-weight models (gpt-oss) are now available under Apache 2.0. OpenAI's new Responses API has replaced Chat Completions as the recommended primitive for agent-style workloads, bringing stateful context, built-in tools, and MCP server support into a single call.
What This Cheat Sheet Covers
This topic spans 14 focused tables and 140 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Core Architecture
| Component | Example | Description |
|---|---|---|
Multi-head self-attention + feed-forward layers | Foundation of all GPT models β processes token sequences in parallel, enabling efficient long-range dependency modeling. | |
Query-Key-Value matrices compute token relationships | β’ Each token attends to all others, assigning importance weights based on semantic relevance β’ enables context-aware representations. | |
Predicts token t_{n+1} given t_1, t_2, ..., t_n | Generates text one token at a time, conditioning each prediction on all previously generated tokens for coherent outputs. | |
8β96 attention heads process different aspects simultaneously | Parallel attention mechanisms allow the model to focus on multiple representation subspaces (syntax, semantics, entities) at once. | |
Unsupervised learning on web text β RLHF alignment | Two-stage process: model learns language patterns from massive datasets, then human feedback refines outputs for helpfulness and safety. |