Cloud Auto-Scaling Cheat Sheet

Updated 2026-03-17

Next Topic: Cloud Auto-Scaling Cheat Sheet

Cloud auto-scaling dynamically adjusts compute resources based on demand, allowing applications to maintain performance during traffic spikes while minimizing costs during low-utilization periods. This capability has evolved from simple threshold-based reactions into sophisticated predictive systems using machine learning that anticipate load changes before they occur. Understanding the distinction between horizontal scaling (adding instances) and vertical scaling (increasing instance size), along with when to apply reactive versus proactive strategies, determines whether your infrastructure scales efficiently or burns budget fighting fires.

What This Cheat Sheet Covers

This topic spans 14 focused tables and 104 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Core Scaling ApproachesTable 2: Dynamic Scaling PoliciesTable 3: Kubernetes Auto-ScalingTable 4: Platform-Specific Auto-ScalingTable 5: Scaling Policies and ConfigurationTable 6: Metrics and MonitoringTable 7: Advanced Scaling TechniquesTable 8: Database and Serverless Auto-ScalingTable 9: Cost Optimization and RightsizingTable 10: Testing and ValidationTable 11: Health Checks and ReliabilityTable 12: Notifications and ObservabilityTable 13: Common Anti-Patterns and PitfallsTable 14: Implementation and Infrastructure as Code

Table 1: Core Scaling Approaches

Strategy	Example	Description
Horizontal Scaling (Scale Out)	Add 3 web servers `min=2, max=10`	• Increases capacity by adding more instances to distribute workload across multiple nodes • provides fault tolerance and theoretically unlimited scalability but requires stateless application design or session management.
Vertical Scaling (Scale Up)	t3.medium → t3.xlarge 2 vCPU → 4 vCPU	• Increases capacity by upgrading to larger instance types with more CPU/RAM • simpler implementation with no architectural changes needed but hits hardware limits and requires downtime for migration.
Diagonal Scaling	Start with 4×large Scale out to 8×large	• Combines both approaches by first scaling up to larger instances then scaling out when vertical limits are reached • optimizes resource density while maintaining horizontal expansion capability.
Reactive Scaling	CPU > 70% for 5 min → add instance	• Responds to observed metrics after load increases • simple to configure and prevents over-provisioning but introduces lag between demand surge and capacity availability.

Table 1: Core Scaling Approaches

Strategy	Example	Description
Horizontal Scaling (Scale Out)	Add 3 web servers `min=2, max=10`	• Increases capacity by adding more instances to distribute workload across multiple nodes • provides fault tolerance and theoretically unlimited scalability but requires stateless application design or session management.
Vertical Scaling (Scale Up)	t3.medium → t3.xlarge 2 vCPU → 4 vCPU	• Increases capacity by upgrading to larger instance types with more CPU/RAM • simpler implementation with no architectural changes needed but hits hardware limits and requires downtime for migration.
Diagonal Scaling	Start with 4×large Scale out to 8×large	• Combines both approaches by first scaling up to larger instances then scaling out when vertical limits are reached • optimizes resource density while maintaining horizontal expansion capability.
Reactive Scaling	CPU > 70% for 5 min → add instance	• Responds to observed metrics after load increases • simple to configure and prevents over-provisioning but introduces lag between demand surge and capacity availability.