OpenAI API Cheat Sheet

Updated 2026-05-28

Next Topic: pgvector for Postgres Vector Search Cheat Sheet

The OpenAI API provides programmatic access to state-of-the-art language models (GPT-5.5, GPT-5.4, GPT-5.2, o3, o4-mini), image generation (gpt-image-2, gpt-image-1.5), speech-to-text (gpt-4o-transcribe), text-to-speech, embeddings, and video generation. The 2025–2026 landscape is dominated by the Responses API as the primary interface for agentic workflows, with built-in tools for web search, code interpreter, file search, computer use, hosted shell, skills, and Model Context Protocol (MCP) integrations. Key 2026 additions include GPT-5.5 (flagship, April 2026), the GPT-5.4 family (March 2026), gpt-image-2 (April 2026), Realtime 2 with speech translation (May 2026), and Compaction for long-running agent contexts. ⚠️ Fine-tuning is being wound down (new users blocked May 2026; existing users until Jan 2027), and the Assistants API sunsets August 26, 2026 — migrate to the Responses API and Conversations API before those dates.

What This Cheat Sheet Covers

This topic spans 18 focused tables and 183 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Core API EndpointsTable 2: Model Families and SelectionTable 3: Authentication and API KeysTable 4: Request Parameters and SamplingTable 5: Structured Outputs and JSON ModeTable 6: Function Calling and Tool UseTable 7: Prompt EngineeringTable 8: Embeddings and Vector SearchTable 9: Error Handling and Rate LimitsTable 10: Streaming and Real-time ResponsesTable 11: Multimodal and VisionTable 12: Assistants, Threads, and MigrationTable 13: Fine-tuning and Model CustomizationTable 14: Batch Processing and Async JobsTable 15: Audio and SpeechTable 16: Usage Monitoring and BillingTable 17: Security and Best PracticesTable 18: Advanced Features and Agentic Primitives

Table 1: Core API Endpoints

The Responses API is OpenAI's primary interface for building agents and complex workflows; Chat Completions remains recommended for simple stateless apps. Knowing which endpoint to use — and the capabilities each one unlocks — is the first decision in any integration.

Endpoint	Example	Description
Responses API	`client.responses.create(model="gpt-5.5", input=[...])`	• Primary interface for stateful agentic conversations • built-in tools: web_search, file_search, code_interpreter, computer_use, hosted_shell, skills, apply_patch • supports MCP, Compaction, Conversations API, WebSocket mode • supersedes Assistants API (sunset Aug 26, 2026).
Chat Completions API	`client.chat.completions.create(model="gpt-5.5", messages=[...])`	• Stateless endpoint for conversational AI • pass message history explicitly each request • supports JSON mode, vision, function calling, streaming • still recommended for simple stateless apps; reasoning models get better performance via Responses API.
Conversations API	`client.conversations.create(model="gpt-5.5"); client.conversations.turns.create(...)`	• Manages long-running stateful conversations within the Responses API • OpenAI handles thread state server-side • replaces Assistants Threads; provides side-by-side migration from Assistants API • released August 2025.
Embeddings API	`client.embeddings.create(model="text-embedding-3-small", input="text")`	• Converts text to dense vector representations for semantic search, clustering, RAG • returns 1536-dim (small) or 3072-dim (large) vectors • supports `dimensions` param for vector compression.
Speech-to-Text	`client.audio.transcriptions.create(model="gpt-4o-transcribe", file=audio)`	• Transcribes audio to text • gpt-4o-transcribe (higher accuracy) or gpt-4o-mini-transcribe (fast, low cost) • supports 98 languages, background noise, diverse accents • `translations` endpoint converts non-English audio to English.
Text-to-Speech	`client.audio.speech.create(model="gpt-4o-mini-tts", voice="alloy", input="text")`	• Generates spoken audio from text • gpt-4o-mini-tts (steerable style/emotion) or tts-1/tts-1-hd (legacy) • 6 voices: alloy, echo, fable, onyx, nova, shimmer • formats: MP3, Opus, AAC, FLAC.

Table 1: Core API Endpoints

Endpoint	Example	Description
Responses API	`client.responses.create(model="gpt-5.5", input=[...])`	• Primary interface for stateful agentic conversations • built-in tools: web_search, file_search, code_interpreter, computer_use, hosted_shell, skills, apply_patch • supports MCP, Compaction, Conversations API, WebSocket mode • supersedes Assistants API (sunset Aug 26, 2026).
Chat Completions API	`client.chat.completions.create(model="gpt-5.5", messages=[...])`	• Stateless endpoint for conversational AI • pass message history explicitly each request • supports JSON mode, vision, function calling, streaming • still recommended for simple stateless apps; reasoning models get better performance via Responses API.
Conversations API	`client.conversations.create(model="gpt-5.5"); client.conversations.turns.create(...)`	• Manages long-running stateful conversations within the Responses API • OpenAI handles thread state server-side • replaces Assistants Threads; provides side-by-side migration from Assistants API • released August 2025.
Embeddings API	`client.embeddings.create(model="text-embedding-3-small", input="text")`	• Converts text to dense vector representations for semantic search, clustering, RAG • returns 1536-dim (small) or 3072-dim (large) vectors • supports `dimensions` param for vector compression.
Speech-to-Text	`client.audio.transcriptions.create(model="gpt-4o-transcribe", file=audio)`	• Transcribes audio to text • gpt-4o-transcribe (higher accuracy) or gpt-4o-mini-transcribe (fast, low cost) • supports 98 languages, background noise, diverse accents • `translations` endpoint converts non-English audio to English.
Text-to-Speech	`client.audio.speech.create(model="gpt-4o-mini-tts", voice="alloy", input="text")`	• Generates spoken audio from text • gpt-4o-mini-tts (steerable style/emotion) or tts-1/tts-1-hd (legacy) • 6 voices: alloy, echo, fable, onyx, nova, shimmer • formats: MP3, Opus, AAC, FLAC.