Load balancing is the fundamental technique for distributing network traffic across multiple servers to ensure no single server becomes overwhelmed, improving application availability, scalability, and fault tolerance. Operating at various layers of the OSI model—from network to application—load balancers act as intelligent traffic directors that continuously monitor backend server health and route requests using sophisticated algorithms. In 2026, load balancing has evolved beyond simple traffic distribution to encompass AI-driven optimization, eBPF-accelerated kernel-level forwarding, and deep service-mesh integration via Istio, Linkerd, and the Kubernetes Gateway API. What makes load balancing essential is that it transforms individual servers into resilient, horizontally scalable systems capable of handling millions of requests per second while maintaining sub-second response times and near-perfect uptime.
What This Cheat Sheet Covers
This topic spans 16 focused tables and 108 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Core Load Balancing Algorithms
| Algorithm | Example | Description |
|---|---|---|
Request1 → Server1Request2 → Server2Request3 → Server1 | • Distributes requests sequentially in rotation • simplest algorithm, best when servers have equal capacity and request cost. | |
server backend1 weight=3;server backend2 weight=1; | • Assigns different weights so higher-weight servers receive proportionally more requests • use when servers have unequal capacity. | |
Server1: 5 connServer2: 2 conn→ Route to Server2 | • Routes to the server with the fewest active connections • ideal when request processing times vary significantly. | |
Server1: 8 pendingServer2: 3 pending→ Route to Server2 | • Counts in-flight requests (not just connections) per target • default AWS ALB algorithm since 2023, better than least connections for HTTP/2. | |
hash(192.168.1.50) → Server2 | • Uses a hash of the client IP to consistently route a client to the same server • provides session persistence without cookies. | |
Server1: 50ms avgServer2: 20ms avg→ Route to Server2 | • Selects the server with the fastest response time and fewest active connections • optimizes for user-perceived latency. |