Cloud auto-scaling enables applications to dynamically adjust compute resources in response to demand, automatically adding or removing capacity as workload patterns change. It operates across major providers (AWS, Azure, GCP) and orchestrators (Kubernetes), balancing performance against cost through metric-driven policies and predictive algorithms. Understanding the distinction between reactive scaling (responding to current load) and proactive scaling (anticipating demand) is critical—most production environments combine both approaches with cooldown periods and stabilization windows to prevent thrashing, a common pitfall where systems oscillate between scale-out and scale-in actions wastefully. In 2026, AI-assisted and event-driven patterns such as KEDA, EKS Auto Mode, and in-place pod resizing are reshaping how teams think about both node-level and workload-level scaling.
What This Cheat Sheet Covers
This topic spans 16 focused tables and 108 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Scaling Dimensions
The two fundamental axes of scaling—adding more instances versus upgrading existing ones—determine everything else: architecture complexity, fault tolerance, cost model, and recovery time. Choosing the right dimension before writing a single policy saves weeks of rework.
| Type | Example | Description |
|---|---|---|
Add 3 EC2 instances when CPU > 70% | • Increases capacity by adding more instances • most common in cloud due to better fault tolerance and elasticity compared to vertical scaling | |
Change instance from t3.medium to t3.xlarge | • Increases capacity by upgrading individual instance resources (CPU, RAM) • requires downtime in many cases and hits hardware limits | |
ASG scales from 2 to 10 instances automatically | • Fully automated scaling based on policies and metrics • eliminates manual intervention and responds faster than human operators |