LLM observability is the practice of monitoring, measuring, and understanding the behavior of large language models in production environments, enabling teams to track quality, performance, cost, and security across AI applications. Unlike traditional software observability, LLM observability must capture the non-deterministic nature of generative AIβtracking prompt inputs, model outputs, token usage, latency, hallucinations, and user feedback across complex multi-step workflows. As LLMs power increasingly critical business applications in 2026, observability has shifted from a nice-to-have debugging tool to production infrastructure essential for reliability, compliance, and cost control. The key mental model: treat LLM observability as distributed tracing for AIβevery request becomes a trace with nested spans capturing retrieval, reasoning, generation, and tool calls, with quality metrics evaluated at each step before responses reach users.
What This Cheat Sheet Covers
This topic spans 19 focused tables and 204 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Core Observability Concepts
| Concept | Example | Description |
|---|---|---|
Complete execution path from user query through LLM calls to final response | End-to-end record of a request's journey through the system, capturing all operations as nested spans with timing and metadata. | |
Single LLM call, vector search, or tool execution within a trace | β’ Individual unit of work within a trace β’ each span has a start time, duration, and attributes like model name or token count. | |
session_id: "user_123_conv_45" groups multiple traces for one conversation | Collection of traces tied to a single user journey or conversation thread, enabling analysis of multi-turn interactions. | |
Token usage per request, p95 latency, cost per query | Quantitative measurement aggregated over time, such as throughput, latency percentiles, error rates, or token counts. | |
[INFO] User prompt: "Summarize quarterly earnings" | Textual record of events with structured or unstructured data, including prompts, completions, and system messages. | |
Adding OpenTelemetry SDK to capture LLM calls automatically | Code or framework integration that emits telemetry data from application code without manual logging for every operation. |