Rate Limiting and Throttling Patterns Cheat Sheet

Updated 2026-03-18

Rate limiting and throttling are essential traffic control mechanisms in distributed systems and APIs that regulate the rate at which clients can make requests within a specified timeframe. These patterns protect backend infrastructure from overload, prevent abuse, enforce usage quotas for tiered pricing models, and ensure fair resource allocation across all users. The key distinction is that rate limiting blocks requests once a threshold is exceeded (returning HTTP 429), while throttling slows down or queues excess requests to smooth traffic flow—though in practice, many practitioners use the terms interchangeably. Understanding the algorithmic foundations (token bucket, leaky bucket, sliding window) and architectural considerations (distributed state management, Redis-backed counters, monitoring) is critical for building resilient, scalable APIs that can withstand traffic spikes, DDoS attempts, and noisy neighbors without degrading service quality for legitimate users.

What This Cheat Sheet Covers

This topic spans 15 focused tables and 81 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Core Rate Limiting AlgorithmsTable 2: Distributed Rate Limiting PatternsTable 3: Rate Limit Identification StrategiesTable 4: HTTP 429 Response and HeadersTable 5: Burst Capacity and Traffic ShapingTable 6: Client-Side Rate Limit HandlingTable 7: Advanced Rate Limiting PatternsTable 8: Allowlist and Bypass PatternsTable 9: Rate Limit Monitoring and AlertingTable 10: API Gateway Rate LimitingTable 11: Quota Management and Usage TrackingTable 12: Throttling vs Rate Limiting ComparisonTable 13: Testing and ValidationTable 14: Security and DDoS ProtectionTable 15: Graceful Degradation and Resilience

Table 1: Core Rate Limiting Algorithms

Algorithm	Example	Description
Token Bucket	`bucket_size = 100` `tokens = 100` `refill_rate = 10/sec` `if tokens > 0: allow()`	• Bucket holds tokens that refill at a constant rate • allows bursts up to bucket size while maintaining long-term average rate limit &bull • Most popular choice for production APIs in 2026.
Leaky Bucket	`queue.add(request)` `process_at_fixed_rate()` `leak_rate = 5/sec`	• Requests enter a FIFO queue and leak out at a constant rate regardless of input bursts &bull • Smooths traffic completely but can cause delays &bull • Effectively enforces steady throughput.
Sliding Window Log	`timestamps = [t1, t2, ...]` `now = time.now()` `valid = filter(t > now-60s)`	• Stores timestamp for every request in a rolling window (e.g., last 60 seconds) • precise but memory-intensive for high traffic &bull • No sudden bursts at window boundaries.

Table 1: Core Rate Limiting Algorithms

Algorithm	Example	Description
Token Bucket	`bucket_size = 100` `tokens = 100` `refill_rate = 10/sec` `if tokens > 0: allow()`	• Bucket holds tokens that refill at a constant rate • allows bursts up to bucket size while maintaining long-term average rate limit &bull • Most popular choice for production APIs in 2026.
Leaky Bucket	`queue.add(request)` `process_at_fixed_rate()` `leak_rate = 5/sec`	• Requests enter a FIFO queue and leak out at a constant rate regardless of input bursts &bull • Smooths traffic completely but can cause delays &bull • Effectively enforces steady throughput.
Sliding Window Log	`timestamps = [t1, t2, ...]` `now = time.now()` `valid = filter(t > now-60s)`	• Stores timestamp for every request in a rolling window (e.g., last 60 seconds) • precise but memory-intensive for high traffic &bull • No sudden bursts at window boundaries.