Software resilience patterns are architectural strategies designed to build fault-tolerant, self-healing systems that continue functioning despite failures, network issues, or overload conditions. In distributed systems, where failures are inevitable rather than exceptional, resilience engineering shifts from preventing failures to designing systems that gracefully handle them. These patterns—from circuit breakers that prevent cascading failures to chaos engineering that deliberately injects faults—form the foundation of modern production systems at scale. Understanding not just what each pattern does but when and why to apply it transforms fragile systems into robust, production-ready architectures that survive the chaos of real-world operations.
What This Cheat Sheet Covers
This topic spans 15 focused tables and 112 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Circuit Breaker States & Behavior
| State | Example | Description |
|---|---|---|
circuitBreaker.state = CLOSEDrequest → downstream service | • Requests flow normally to the downstream service • failure counter tracks errors against a threshold (e.g., 5 failures in 10 seconds) before opening. | |
circuitBreaker.state = OPENrequest → immediate FailFastException | • All requests fail immediately without calling the service • protects downstream by preventing further load • transitions to half-open after a timeout period (e.g., 60 seconds). | |
circuitBreaker.state = HALF_OPENlimited test requests → service | • Allows a limited number of test requests (e.g., 3) to check if the service recovered • success → transitions to closed • failure → transitions back to open. | |
failureThreshold = 5errorPercentage = 50% | • Trigger condition for opening the circuit • can be absolute count (5 failures) or percentage (50% error rate) within a sliding time window. |