GPT Models Cheat Sheet

Updated 2026-04-28

Next Topic: GRPO (Group Relative Policy Optimization) Cheat Sheet

GPT (Generative Pre-trained Transformer) models are large language models developed by OpenAI that use transformer architecture to generate text, analyze images, process audio, and perform complex reasoning. The lineup has expanded dramatically: the current flagship, GPT-5.5 (released April 2026), delivers 1M-token context and native computer-use capabilities, while the GPT-4.1 family offers a 1M-token non-reasoning alternative, and open-weight models (gpt-oss) are now available under Apache 2.0. OpenAI's new Responses API has replaced Chat Completions as the recommended primitive for agent-style workloads, bringing stateful context, built-in tools, and MCP server support into a single call.

What This Cheat Sheet Covers

This topic spans 14 focused tables and 140 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Core ArchitectureTable 2: Model Variants and SpecificationsTable 3: Tokenization and ContextTable 4: API Parameters and ConfigurationTable 5: API Endpoints and IntegrationTable 6: Agentic Tools and Built-in FeaturesTable 7: Multimodal CapabilitiesTable 8: Fine-tuning and CustomizationTable 9: Decoding and Generation StrategiesTable 10: Model Selection and Use CasesTable 11: Performance and OptimizationTable 12: Pricing and Cost ManagementTable 13: Open-Weight ModelsTable 14: Limitations and Considerations

Table 1: Core Architecture

Underneath every GPT model is the same transformer machinery, and these are its load-bearing pieces — self-attention that lets each token weigh every other, autoregressive decoding that writes one token at a time, BPE tokenization, and the training stages that turn raw web text into an aligned assistant. Understanding how they fit together explains both why these models are so capable and where their quirks come from.

Component	Example	Description
Transformer architecture	Multi-head self-attention + feed-forward layers	Foundation of all GPT models — processes token sequences in parallel, enabling efficient long-range dependency modeling.
Self-attention mechanism	Query-Key-Value matrices compute token relationships	• Each token attends to all others, assigning importance weights based on semantic relevance • enables context-aware representations.
Autoregressive generation	Predicts token $t_{n+1}$ given $t_1, t_2, ..., t_n$	Generates text one token at a time, conditioning each prediction on all previously generated tokens for coherent outputs.
Multi-head attention	8–96 attention heads process different aspects simultaneously	Parallel attention mechanisms allow the model to focus on multiple representation subspaces (syntax, semantics, entities) at once.
Pre-training + fine-tuning	Unsupervised learning on web text → RLHF alignment	Two-stage process: model learns language patterns from massive datasets, then human feedback refines outputs for helpfulness and safety.

Table 1: Core Architecture

Component	Example	Description
Transformer architecture	Multi-head self-attention + feed-forward layers	Foundation of all GPT models — processes token sequences in parallel, enabling efficient long-range dependency modeling.
Self-attention mechanism	Query-Key-Value matrices compute token relationships	• Each token attends to all others, assigning importance weights based on semantic relevance • enables context-aware representations.
Autoregressive generation	Predicts token $t_{n+1}$ given $t_1, t_2, ..., t_n$	Generates text one token at a time, conditioning each prediction on all previously generated tokens for coherent outputs.
Multi-head attention	8–96 attention heads process different aspects simultaneously	Parallel attention mechanisms allow the model to focus on multiple representation subspaces (syntax, semantics, entities) at once.
Pre-training + fine-tuning	Unsupervised learning on web text → RLHF alignment	Two-stage process: model learns language patterns from massive datasets, then human feedback refines outputs for helpfulness and safety.

GPT Models (GPT-4, GPT-5) Cheat Sheet

Table 1: Core Architecture

GPT Models (GPT-4, GPT-5) Cheat Sheet

Table 1: Core Architecture