Load Balancing Cheat Sheet

Updated 2026-04-29

Next Topic: Network Address Translation - NAT Cheat Sheet

Load balancing is the fundamental technique for distributing network traffic across multiple servers to ensure no single server becomes overwhelmed, improving application availability, scalability, and fault tolerance. Operating at various layers of the OSI model—from network to application—load balancers act as intelligent traffic directors that continuously monitor backend server health and route requests using sophisticated algorithms. In 2026, load balancing has evolved beyond simple traffic distribution to encompass AI-driven optimization, eBPF-accelerated kernel-level forwarding, and deep service-mesh integration via Istio, Linkerd, and the Kubernetes Gateway API. What makes load balancing essential is that it transforms individual servers into resilient, horizontally scalable systems capable of handling millions of requests per second while maintaining sub-second response times and near-perfect uptime.

What This Cheat Sheet Covers

This topic spans 16 focused tables and 108 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Core Load Balancing AlgorithmsTable 2: Load Balancing LayersTable 3: Load Balancer Deployment ModesTable 4: Session Persistence TechniquesTable 5: Health Check StrategiesTable 6: High Availability PatternsTable 7: SSL/TLS HandlingTable 8: Advanced Routing TechniquesTable 9: Global Load BalancingTable 10: Cloud Load Balancer TypesTable 11: Load Balancer Software & ToolsTable 12: Container & Kubernetes Load BalancingTable 13: Performance & Scaling FeaturesTable 14: Security FeaturesTable 15: Monitoring & ObservabilityTable 16: Common Deployment Scenarios

Table 1: Core Load Balancing Algorithms

Algorithm	Example	Description
Round Robin	`Request1 → Server1` `Request2 → Server2` `Request3 → Server1`	• Distributes requests sequentially in rotation • simplest algorithm, best when servers have equal capacity and request cost.
Weighted Round Robin	`server backend1 weight=3;` `server backend2 weight=1;`	• Assigns different weights so higher-weight servers receive proportionally more requests • use when servers have unequal capacity.
Least Connections	`Server1: 5 conn` `Server2: 2 conn` `→ Route to Server2`	• Routes to the server with the fewest active connections • ideal when request processing times vary significantly.
Least Outstanding Requests	`Server1: 8 pending` `Server2: 3 pending` `→ Route to Server2`	• Counts in-flight requests (not just connections) per target • default AWS ALB algorithm since 2023, better than least connections for HTTP/2.
IP Hash	`hash(192.168.1.50) → Server2`	• Uses a hash of the client IP to consistently route a client to the same server • provides session persistence without cookies.
Least Response Time	`Server1: 50ms avg` `Server2: 20ms avg` `→ Route to Server2`	• Selects the server with the fastest response time and fewest active connections • optimizes for user-perceived latency.

Table 1: Core Load Balancing Algorithms

Algorithm	Example	Description
Round Robin	`Request1 → Server1` `Request2 → Server2` `Request3 → Server1`	• Distributes requests sequentially in rotation • simplest algorithm, best when servers have equal capacity and request cost.
Weighted Round Robin	`server backend1 weight=3;` `server backend2 weight=1;`	• Assigns different weights so higher-weight servers receive proportionally more requests • use when servers have unequal capacity.
Least Connections	`Server1: 5 conn` `Server2: 2 conn` `→ Route to Server2`	• Routes to the server with the fewest active connections • ideal when request processing times vary significantly.
Least Outstanding Requests	`Server1: 8 pending` `Server2: 3 pending` `→ Route to Server2`	• Counts in-flight requests (not just connections) per target • default AWS ALB algorithm since 2023, better than least connections for HTTP/2.
IP Hash	`hash(192.168.1.50) → Server2`	• Uses a hash of the client IP to consistently route a client to the same server • provides session persistence without cookies.
Least Response Time	`Server1: 50ms avg` `Server2: 20ms avg` `→ Route to Server2`	• Selects the server with the fastest response time and fewest active connections • optimizes for user-perceived latency.