Skip to main content

Menu

LEVEL 0
0/5 XP
HomeAboutTopicsPricingMy VaultStatsPractice TestsCertifications

Categories

🎓 Certifications
🤖 Artificial Intelligence
☁️ Cloud and Infrastructure
💾 Data and Databases
💼 Professional Skills
🎯 Programming and Development
🔒 Security and Networking
📚 Specialized Topics
CheatGrid
HomeAboutTopicsPricingMy VaultStatsPractice TestsCertifications
LVLEVEL 0
0/5 XP
GitHub
© 2026 CheatGrid™. All rights reserved.
Privacy PolicyTerms of UseAboutContact

GPT Models (GPT-4, GPT-5) Cheat Sheet

GPT Models (GPT-4, GPT-5) Cheat Sheet

Back to Generative AI
Updated 2026-04-28
Next Topic: GRPO (Group Relative Policy Optimization) Cheat Sheet

GPT (Generative Pre-trained Transformer) models are large language models developed by OpenAI that use transformer architecture to generate text, analyze images, process audio, and perform complex reasoning. The lineup has expanded dramatically: the current flagship, GPT-5.5 (released April 2026), delivers 1M-token context and native computer-use capabilities, while the GPT-4.1 family offers a 1M-token non-reasoning alternative, and open-weight models (gpt-oss) are now available under Apache 2.0. OpenAI's new Responses API has replaced Chat Completions as the recommended primitive for agent-style workloads, bringing stateful context, built-in tools, and MCP server support into a single call.

What This Cheat Sheet Covers

This topic spans 14 focused tables and 140 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Core ArchitectureTable 2: Model Variants and SpecificationsTable 3: Tokenization and ContextTable 4: API Parameters and ConfigurationTable 5: API Endpoints and IntegrationTable 6: Agentic Tools and Built-in FeaturesTable 7: Multimodal CapabilitiesTable 8: Fine-tuning and CustomizationTable 9: Decoding and Generation StrategiesTable 10: Model Selection and Use CasesTable 11: Performance and OptimizationTable 12: Pricing and Cost ManagementTable 13: Open-Weight ModelsTable 14: Limitations and Considerations

Table 1: Core Architecture

Underneath every GPT model is the same transformer machinery, and these are its load-bearing pieces — self-attention that lets each token weigh every other, autoregressive decoding that writes one token at a time, BPE tokenization, and the training stages that turn raw web text into an aligned assistant. Understanding how they fit together explains both why these models are so capable and where their quirks come from.

ComponentExampleDescription
Transformer architecture
Multi-head self-attention + feed-forward layers
Foundation of all GPT models — processes token sequences in parallel, enabling efficient long-range dependency modeling.
Self-attention mechanism
Query-Key-Value matrices compute token relationships
• Each token attends to all others, assigning importance weights based on semantic relevance
• enables context-aware representations.
Autoregressive generation
Predicts token t_{n+1} given t_1, t_2, ..., t_n
Generates text one token at a time, conditioning each prediction on all previously generated tokens for coherent outputs.
Multi-head attention
8–96 attention heads process different aspects simultaneously
Parallel attention mechanisms allow the model to focus on multiple representation subspaces (syntax, semantics, entities) at once.
Pre-training + fine-tuning
Unsupervised learning on web text → RLHF alignment
Two-stage process: model learns language patterns from massive datasets, then human feedback refines outputs for helpfulness and safety.

More in Generative AI

  • Google Gemini Cheat Sheet
  • GRPO (Group Relative Policy Optimization) Cheat Sheet
  • Advanced RAG Patterns and Optimization Cheat Sheet
  • ColBERT and Late Interaction Retrieval Cheat Sheet
  • LlamaIndex Cheat Sheet
  • pgvector for Postgres Vector Search Cheat Sheet
View all 95 topics in Generative AI