Cloud auto-scaling enables applications to dynamically adjust compute resources in response to demand, automatically adding or removing capacity as workload patterns change. It operates across major providers (AWS, Azure, GCP) and orchestrators (Kubernetes), balancing performance against cost through metric-driven policies and predictive algorithms. Understanding the distinction between reactive scaling (responding to current load) and proactive scaling (anticipating demand) is critical—most production environments combine both approaches with cooldown periods and stabilization windows to prevent thrashing, a common pitfall where systems oscillate between scale-out and scale-in actions wastefully.
What This Cheat Sheet Covers
This topic spans 16 focused tables and 96 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Scaling Dimensions
| Type | Example | Description |
|---|---|---|
Add 3 EC2 instances when CPU > 70% | • Increases capacity by adding more instances • most common in cloud due to better fault tolerance and elasticity compared to vertical scaling | |
Change instance from t3.medium to t3.xlarge | • Increases capacity by upgrading individual instance resources (CPU, RAM) • requires downtime in many cases and hits hardware limits | |
Start vertical, then horizontal when limits reached | • Combines both approaches: scale up first for simplicity, scale out later when vertical limits are reached • balances operational complexity |