Circuit Breakers: Building Systems That Fail Gracefully
Implement circuit breaker patterns to prevent cascading failures, protect downstream services, and maintain system stability when things go wrong.
February 11, 2026 · 8 min · 1677 words · Rob Washington
Table of Contents
In distributed systems, failures are inevitable. A single slow or failing service can cascade through your entire architecture, turning a minor issue into a major outage. Circuit breakers prevent this by detecting failures and stopping the cascade before it spreads.
A circuit breaker wraps calls to external services and monitors for failures. When failures exceed a threshold, the circuit “opens” and immediately rejects requests without attempting the call. After a timeout, it allows limited requests through to test if the service has recovered.
importtimefromenumimportEnumfromthreadingimportLockfromtypingimportCallable,AnyfromfunctoolsimportwrapsclassCircuitState(Enum):CLOSED="closed"OPEN="open"HALF_OPEN="half_open"classCircuitBreaker:def__init__(self,failure_threshold:int=5,recovery_timeout:int=30,half_open_max_calls:int=3):self.failure_threshold=failure_thresholdself.recovery_timeout=recovery_timeoutself.half_open_max_calls=half_open_max_callsself.state=CircuitState.CLOSEDself.failures=0self.successes=0self.last_failure_time=Noneself.half_open_calls=0self._lock=Lock()def_should_attempt_reset(self)->bool:"""Check if enough time has passed to try again."""ifself.last_failure_timeisNone:returnFalsereturntime.time()-self.last_failure_time>=self.recovery_timeoutdef_record_success(self):"""Record a successful call."""withself._lock:ifself.state==CircuitState.HALF_OPEN:self.successes+=1ifself.successes>=self.half_open_max_calls:# Enough successes, close the circuitself.state=CircuitState.CLOSEDself.failures=0self.successes=0self.half_open_calls=0else:# In closed state, reset failure count on successself.failures=0def_record_failure(self):"""Record a failed call."""withself._lock:self.failures+=1self.last_failure_time=time.time()ifself.state==CircuitState.HALF_OPEN:# Any failure in half-open goes back to openself.state=CircuitState.OPENself.half_open_calls=0self.successes=0elifself.failures>=self.failure_threshold:# Too many failures, open the circuitself.state=CircuitState.OPENdefcan_execute(self)->bool:"""Check if a call should be attempted."""withself._lock:ifself.state==CircuitState.CLOSED:returnTrueifself.state==CircuitState.OPEN:ifself._should_attempt_reset():self.state=CircuitState.HALF_OPENself.half_open_calls=0self.successes=0returnTruereturnFalse# Half-open: allow limited callsifself.half_open_calls<self.half_open_max_calls:self.half_open_calls+=1returnTruereturnFalsedefcall(self,func:Callable,*args,**kwargs)->Any:"""Execute a function through the circuit breaker."""ifnotself.can_execute():raiseCircuitOpenError(f"Circuit is {self.state.value}")try:result=func(*args,**kwargs)self._record_success()returnresultexceptExceptionase:self._record_failure()raiseclassCircuitOpenError(Exception):"""Raised when circuit is open and rejecting calls."""pass# Decorator versiondefcircuit_breaker(failure_threshold=5,recovery_timeout=30):breaker=CircuitBreaker(failure_threshold,recovery_timeout)defdecorator(func):@wraps(func)defwrapper(*args,**kwargs):returnbreaker.call(func,*args,**kwargs)wrapper.circuit_breaker=breaker# Expose for monitoringreturnwrapperreturndecorator
importhttpx@circuit_breaker(failure_threshold=3,recovery_timeout=60)deffetch_user_data(user_id:str)->dict:response=httpx.get(f"https://api.example.com/users/{user_id}",timeout=5.0)response.raise_for_status()returnresponse.json()# Usage with fallbackdefget_user(user_id:str)->dict:try:returnfetch_user_data(user_id)exceptCircuitOpenError:# Return cached data or defaultreturnget_cached_user(user_id)or{"id":user_id,"status":"unavailable"}excepthttpx.HTTPError:# Let the circuit breaker track this failurereturn{"id":user_id,"status":"error"}
classCircuitBreakerRegistry:def__init__(self):self._breakers:dict[str,CircuitBreaker]={}self._lock=Lock()defget(self,name:str,failure_threshold:int=5,recovery_timeout:int=30)->CircuitBreaker:withself._lock:ifnamenotinself._breakers:self._breakers[name]=CircuitBreaker(failure_threshold=failure_threshold,recovery_timeout=recovery_timeout)returnself._breakers[name]defget_all_states(self)->dict[str,str]:"""Get states of all circuit breakers for monitoring."""return{name:breaker.state.valueforname,breakerinself._breakers.items()}# Global registrybreakers=CircuitBreakerRegistry()# Configure per-servicedefcall_payment_service(data):cb=breakers.get("payment",failure_threshold=2,recovery_timeout=120)returncb.call(payment_client.process,data)defcall_inventory_service(sku):cb=breakers.get("inventory",failure_threshold=5,recovery_timeout=30)returncb.call(inventory_client.check,sku)
importasynciofromtypingimportCoroutineclassAsyncCircuitBreaker(CircuitBreaker):asyncdefcall_async(self,coro:Coroutine)->Any:ifnotself.can_execute():raiseCircuitOpenError(f"Circuit is {self.state.value}")try:result=awaitcoroself._record_success()returnresultexceptExceptionase:self._record_failure()raise# Usagepayment_breaker=AsyncCircuitBreaker(failure_threshold=3)asyncdefprocess_payment(order_id:str):try:returnawaitpayment_breaker.call_async(payment_api.charge(order_id))exceptCircuitOpenError:# Queue for retry laterawaitretry_queue.push({"order_id":order_id,"action":"charge"})return{"status":"queued"}
Set appropriate thresholds: Too low = flapping, too high = slow response to failures
Use fallbacks: Always have a degraded-but-working response when circuits open
Monitor state changes: Alert when circuits open — it indicates a real problem
Different settings per service: Critical services might need lower thresholds
Test your circuit breakers: Chaos engineering — intentionally break things to verify behavior
Combine with retries carefully: Retries inside a circuit breaker can accelerate failure detection
Circuit breakers are essential infrastructure for any distributed system. They transform “everything is broken” into “this one thing is broken, everything else is fine.” That’s the difference between a minor incident and a major outage.
📬 Get the Newsletter
Weekly insights on DevOps, automation, and CLI mastery. No spam, unsubscribe anytime.