Service Level Objectives (SLOs) are quantifiable reliability targets that define expected service behavior from a user's perspective, serving as the cornerstone of Site Reliability Engineering (SRE) practice. They bridge the gap between engineering capabilities and business expectations by establishing precise, measurable goals for service availability, latency, throughput, and correctness. SLOs enable data-driven prioritization through error budgets—the acceptable unreliability threshold—allowing teams to balance feature velocity with system stability while maintaining customer satisfaction. Originating from Google's SRE methodology, SLOs have evolved into an industry-standard framework for managing distributed systems, microservices, data pipelines, and APIs, complemented by sophisticated alerting strategies like multi-window multi-burn-rate alerts that trigger only when user experience is genuinely at risk.
What This Cheat Sheet Covers
This topic spans 21 focused tables and 155 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Core Concepts
| Concept | Example | Description |
|---|---|---|
availability = successful_requests / total_requests | • Quantitative measure of service behavior • raw metric or ratio collected from real user traffic. | |
99.9% of requests succeed in 30 days | • Target value/range for SLI • defines acceptable reliability threshold without financial penalties. | |
Refund 10% if availability < 99.5% | • Contractual promise backed by financial/business consequences • SLO should be stricter than SLA. | |
0.1% for 99.9% SLO = 43.2 min/month | • Acceptable unreliability (100% - SLO) • exhaustion triggers feature freeze or priority shift. |