Monitoring & Logging Cheat Sheet

Updated 2026-04-28

Next Topic: New Relic Observability Platform Cheat Sheet

Monitoring and logging form the visibility layer of modern infrastructure, enabling teams to understand system behavior, diagnose failures, and optimize performance. Observability—the ability to infer internal system state from external outputs—organizes four signals: metrics (quantitative time-series), logs (timestamped event records), traces (distributed request paths), and profiles (continuous CPU/memory sampling) into a coherent framework. In 2026, eBPF-based zero-code instrumentation and OpenTelemetry standardization have fundamentally changed how telemetry is collected, removing the need to instrument every service manually. Effective monitoring still requires balancing signal-to-noise ratio: collecting enough data to answer unknown questions, but not so much that costs explode or engineers drown in alerts.

What This Cheat Sheet Covers

This topic spans 34 focused tables and 175 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: The Four Signals of ObservabilityTable 2: Metric Types and Use CasesTable 3: Log Levels and When to Use ThemTable 4: Structured Logging Best PracticesTable 5: Distributed Tracing ConceptsTable 6: Trace Sampling StrategiesTable 7: Context Propagation StandardsTable 8: Monitoring MethodologiesTable 9: Service Level Objectives (SLOs)Table 10: Metrics Collection ModelsTable 11: Percentiles for Latency AnalysisTable 12: Metric Cardinality ManagementTable 13: Alerting StrategiesTable 14: Alert Notification and RoutingTable 15: Time Series DatabasesTable 16: Log Aggregation PlatformsTable 17: Distributed Tracing BackendsTable 18: Log Shippers and ForwardersTable 19: OpenTelemetry ComponentsTable 20: Application Performance Monitoring (APM)Table 21: Alerting and Incident Management PlatformsTable 22: Monitoring as Code and ConfigurationTable 23: Synthetic MonitoringTable 24: Blackbox vs Whitebox MonitoringTable 25: Log Retention and Storage LifecycleTable 26: Dashboard Best PracticesTable 27: Compliance and Audit LoggingTable 28: Observability Anti-Patterns to AvoidTable 29: Cost Optimization StrategiesTable 30: Continuous ProfilingTable 31: eBPF ObservabilityTable 32: Real User Monitoring (RUM)Table 33: DORA MetricsTable 34: AIOps and AI-Powered Observability

Table 1: The Four Signals of Observability

Everything else in this cheat sheet sits on top of four telemetry signals. Metrics tell you what is happening at a glance, logs tell you what happened in detail, traces tell you where a request spent its time, and profiles — the newest addition — tell you why the code itself is slow. Knowing which signal answers which question is the foundation of debugging distributed systems.

Signal	Example	Description
Metrics	`http_requests_total{method="GET", status="200"} 1547`	• Numerical measurements aggregated over time • low-cardinality time-series ideal for dashboards, alerts, and trend analysis.
Logs	`2026-03-03T15:04:05Z ERROR user=12345 msg="payment failed" amount=99.99`	• Timestamped text records of discrete events • provide detailed context for debugging specific failures and understanding application behavior.

Table 1: The Four Signals of Observability

Signal	Example	Description
Metrics	`http_requests_total{method="GET", status="200"} 1547`	• Numerical measurements aggregated over time • low-cardinality time-series ideal for dashboards, alerts, and trend analysis.
Logs	`2026-03-03T15:04:05Z ERROR user=12345 msg="payment failed" amount=99.99`	• Timestamped text records of discrete events • provide detailed context for debugging specific failures and understanding application behavior.