Service Level Objectives Cheat Sheet

Updated 2026-03-19

Next Topic: Site Reliability Engineering (SRE) Cheat Sheet

Service Level Objectives (SLOs) are quantifiable reliability targets that define expected service behavior from a user's perspective, serving as the cornerstone of Site Reliability Engineering (SRE) practice. They bridge the gap between engineering capabilities and business expectations by establishing precise, measurable goals for service availability, latency, throughput, and correctness. SLOs enable data-driven prioritization through error budgets—the acceptable unreliability threshold—allowing teams to balance feature velocity with system stability while maintaining customer satisfaction. Originating from Google's SRE methodology, SLOs have evolved into an industry-standard framework for managing distributed systems, microservices, data pipelines, and APIs, complemented by sophisticated alerting strategies like multi-window multi-burn-rate alerts that trigger only when user experience is genuinely at risk.

What This Cheat Sheet Covers

This topic spans 21 focused tables and 155 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Core ConceptsSLI Selection and MeasurementLatency Percentiles for SLIsError Budget FundamentalsBurn Rate Mechanics and AlertingMulti-Window Multi-Burn-Rate AlertsTime-Based vs Request-Based SLIsAvailability Targets and Nines of AvailabilityError Rate Thresholds and DetectionMeasurement Windows and AlignmentSLO Compliance ReportingMonitoring Tools and ImplementationSLO-Based Alerting StrategiesOrganizational PracticesUser-Centric SLO DesignSLO Review and AdjustmentDependency SLOs and Cascading EffectsAdvanced TopicsPromQL and Query PatternsGolden Signals and RED MethodSpecialized Use Cases

Core Concepts

Concept	Example	Description
Service Level Indicator (SLI)	`availability = successful_requests / total_requests`	• Quantitative measure of service behavior • raw metric or ratio collected from real user traffic.
Service Level Objective (SLO)	99.9% of requests succeed in 30 days	• Target value/range for SLI • defines acceptable reliability threshold without financial penalties.
Service Level Agreement (SLA)	Refund 10% if availability < 99.5%	• Contractual promise backed by financial/business consequences • SLO should be stricter than SLA.
Error Budget	0.1% for 99.9% SLO = 43.2 min/month	• Acceptable unreliability (100% - SLO) • exhaustion triggers feature freeze or priority shift.

Core Concepts

Concept	Example	Description
Service Level Indicator (SLI)	`availability = successful_requests / total_requests`	• Quantitative measure of service behavior • raw metric or ratio collected from real user traffic.
Service Level Objective (SLO)	99.9% of requests succeed in 30 days	• Target value/range for SLI • defines acceptable reliability threshold without financial penalties.
Service Level Agreement (SLA)	Refund 10% if availability < 99.5%	• Contractual promise backed by financial/business consequences • SLO should be stricter than SLA.
Error Budget	0.1% for 99.9% SLO = 43.2 min/month	• Acceptable unreliability (100% - SLO) • exhaustion triggers feature freeze or priority shift.