Circuit Breaker Patterns: Failing Fast Without Failing Hard

Your payment service is down. Every request to it times out after 30 seconds. You have 100 requests per second hitting that endpoint. Do the math: within a minute, you’ve got 6,000 threads waiting on a dead service, and your entire application is choking.

This is where circuit breakers earn their keep.

The Problem: Cascading Failures

In distributed systems, a single failing dependency can take down everything. Without protection, your system will:

Keep sending requests to the dead service
Exhaust connection pools waiting for timeouts
Queue up requests behind the slow ones
Eventually crash under the load

The naive approach—retrying harder—makes things worse. You’re DDoSing your own failing service while burning through your resources.

The Circuit Breaker Pattern

Borrowed from electrical engineering, the circuit breaker pattern detects failures and prevents the system from repeatedly trying operations that are likely to fail.

Basic Implementation

Here’s a TypeScript circuit breaker that captures the essential behavior:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
enum CircuitState {
  CLOSED = 'closed',
  OPEN = 'open',
  HALF_OPEN = 'half_open'
}

interface CircuitBreakerOptions {
  failureThreshold: number;      // Failures before opening
  successThreshold: number;      // Successes to close from half-open
  timeout: number;               // Milliseconds before trying half-open
}

class CircuitBreaker {
  private state: CircuitState = CircuitState.CLOSED;
  private failures: number = 0;
  private successes: number = 0;
  private lastFailureTime: number = 0;
  
  constructor(private options: CircuitBreakerOptions) {}

  async execute<T>(fn: () => Promise<T>): Promise<T> {
    if (this.state === CircuitState.OPEN) {
      if (Date.now() - this.lastFailureTime > this.options.timeout) {
        this.state = CircuitState.HALF_OPEN;
        this.successes = 0;
      } else {
        throw new Error('Circuit breaker is open');
      }
    }

    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      throw error;
    }
  }

  private onSuccess(): void {
    this.failures = 0;
    
    if (this.state === CircuitState.HALF_OPEN) {
      this.successes++;
      if (this.successes >= this.options.successThreshold) {
        this.state = CircuitState.CLOSED;
      }
    }
  }

  private onFailure(): void {
    this.failures++;
    this.lastFailureTime = Date.now();

    if (this.state === CircuitState.HALF_OPEN || 
        this.failures >= this.options.failureThreshold) {
      this.state = CircuitState.OPEN;
    }
  }
}

Real-World Configuration

The defaults matter. Here’s what works in production:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
const paymentCircuit = new CircuitBreaker({
  failureThreshold: 5,      // Trip after 5 consecutive failures
  successThreshold: 3,      // Need 3 successes to close
  timeout: 30000            // Try again after 30 seconds
});

// Usage
async function processPayment(order: Order): Promise<PaymentResult> {
  return paymentCircuit.execute(async () => {
    return await paymentService.charge(order);
  });
}

For different service types, adjust accordingly:

Service Type	Failure Threshold	Timeout	Rationale
Payment	5	30s	High stakes, needs quick recovery
Email	10	60s	Can tolerate delays
Cache	3	10s	Should be fast or not at all
External API	5	120s	Third parties need longer recovery

Handling Open Circuits Gracefully

When the circuit opens, you have options beyond just failing:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
async function getProductRecommendations(userId: string): Promise<Product[]> {
  try {
    return await recommendationCircuit.execute(async () => {
      return await recommendationService.getForUser(userId);
    });
  } catch (error) {
    if (error.message === 'Circuit breaker is open') {
      // Fallback: return cached or generic recommendations
      return await cache.get(`recommendations:${userId}`) 
        ?? await getDefaultRecommendations();
    }
    throw error;
  }
}

Common fallback strategies:

Cache: Return stale but valid data
Default: Return safe default values
Degrade: Offer reduced functionality
Queue: Accept the request for later processing
Redirect: Route to an alternate service

Advanced: Sliding Window Failure Detection

Simple consecutive failure counts can be noisy. A sliding window gives better signal:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
class SlidingWindowCircuitBreaker {
  private requests: { timestamp: number; success: boolean }[] = [];
  private windowMs: number = 60000; // 1 minute window
  
  private getFailureRate(): number {
    const now = Date.now();
    this.requests = this.requests.filter(r => now - r.timestamp < this.windowMs);
    
    if (this.requests.length < 10) return 0; // Need minimum sample
    
    const failures = this.requests.filter(r => !r.success).length;
    return failures / this.requests.length;
  }

  private shouldOpen(): boolean {
    return this.getFailureRate() > 0.5; // Open at 50% failure rate
  }
}

This approach is more resilient to transient failures and gives you percentage-based thresholds.

Monitoring and Observability

A circuit breaker you can’t observe is a circuit breaker that will surprise you:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
class ObservableCircuitBreaker extends CircuitBreaker {
  private metrics: MetricsClient;

  private recordStateChange(from: CircuitState, to: CircuitState): void {
    this.metrics.increment('circuit_breaker.state_change', {
      name: this.name,
      from,
      to
    });
    
    if (to === CircuitState.OPEN) {
      this.metrics.increment('circuit_breaker.trips', { name: this.name });
      // Alert on-call
      alerting.notify(`Circuit ${this.name} opened`);
    }
  }
}

Key metrics to track:

State changes: When and how often circuits trip
Failure rate: What’s triggering the breaker
Rejection rate: How many requests are being fast-failed
Recovery time: How long until services heal

The Bulkhead Pattern: Circuit Breakers’ Partner

Circuit breakers prevent cascading failures. Bulkheads isolate them:

1
2
3
4
5
6
7
8
const pools = {
  payments: new ConnectionPool({ maxSize: 20 }),
  notifications: new ConnectionPool({ maxSize: 10 }),
  analytics: new ConnectionPool({ maxSize: 5 })
};

// Each service gets its own pool
// If analytics is slow, payments still has full capacity

Combine both patterns: circuit breakers per service, connection pools per service. Failures stay contained.

Libraries Worth Using

Don’t build circuit breakers from scratch for production. These are battle-tested:

Node.js:

opossum - Netflix Hystrix-inspired
cockatiel - Policy-based resilience

Python:

Go:

When Not to Use Circuit Breakers

They’re not universal solutions:

Idempotent batch jobs: Better to retry with backoff
User-initiated retries: Let humans decide when to retry
Fire-and-forget: If you don’t care about the result, don’t track failures
Single points of failure: A circuit breaker on your only database connection just makes failures more confusing

Key Takeaways

Fail fast: Open circuits reject requests immediately instead of waiting for timeouts
Fail gracefully: Always have a fallback strategy
Tune for your service: Payment systems and analytics have different tolerance
Observe everything: You can’t fix what you can’t see
Combine with bulkheads: Isolation and protection work better together

Circuit breakers turn catastrophic failures into graceful degradation. Your users get “temporarily unavailable” instead of “entire site is down.” That’s the difference between an incident and a blip.

Your system will fail. The question is whether the failure spreads or stays contained.

The Problem: Cascading Failures#

The Circuit Breaker Pattern#

Basic Implementation#

Real-World Configuration#

Handling Open Circuits Gracefully#

Advanced: Sliding Window Failure Detection#

Monitoring and Observability#

The Bulkhead Pattern: Circuit Breakers’ Partner#

Libraries Worth Using#

When Not to Use Circuit Breakers#

Key Takeaways#

📬 Get the Newsletter