When a downstream service dies, the worst thing you can do is keep hammering it with requests. Each request ties up a connection, burns through your timeout budget, and cascades the failure upstream. Circuit breakers solve this by detecting failures early and failing fast.

The Problem: Cascade Failures

Imagine Service A calls Service B, which calls Service C:

UserServiceAServiceBServiceC(dead)

Without protection:

  1. Service C is down, responses timeout after 30 seconds
  2. Service B threads pile up waiting for C
  3. Service A threads pile up waiting for B
  4. Everything grinds to a halt

The entire system fails because one service is unhealthy.

Circuit Breaker States

A circuit breaker has three states:

CLOSEDfafisaluiuclrcueersses>thresholdtimeHoAOuLPtFE-ONeOPxPEpENiNres
  • Closed: Normal operation. Requests flow through. Track failure rate.
  • Open: Circuit tripped. Reject requests immediately without calling downstream.
  • Half-Open: Test the waters. Allow a few requests through. Success → Close. Failure → Open.

Basic Implementation

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
class CircuitBreaker {
  constructor(options = {}) {
    this.failureThreshold = options.failureThreshold || 5;
    this.resetTimeout = options.resetTimeout || 30000;
    this.halfOpenRequests = options.halfOpenRequests || 3;
    
    this.state = 'CLOSED';
    this.failures = 0;
    this.successes = 0;
    this.lastFailureTime = null;
    this.halfOpenAttempts = 0;
  }

  async call(fn) {
    if (this.state === 'OPEN') {
      if (Date.now() - this.lastFailureTime >= this.resetTimeout) {
        this.state = 'HALF_OPEN';
        this.halfOpenAttempts = 0;
      } else {
        throw new CircuitOpenError('Circuit breaker is open');
      }
    }

    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      throw error;
    }
  }

  onSuccess() {
    if (this.state === 'HALF_OPEN') {
      this.successes++;
      if (this.successes >= this.halfOpenRequests) {
        this.reset();
      }
    } else {
      this.failures = 0;
    }
  }

  onFailure() {
    this.failures++;
    this.lastFailureTime = Date.now();
    
    if (this.state === 'HALF_OPEN') {
      this.state = 'OPEN';
      this.successes = 0;
    } else if (this.failures >= this.failureThreshold) {
      this.state = 'OPEN';
    }
  }

  reset() {
    this.state = 'CLOSED';
    this.failures = 0;
    this.successes = 0;
    this.halfOpenAttempts = 0;
  }
}

class CircuitOpenError extends Error {
  constructor(message) {
    super(message);
    this.name = 'CircuitOpenError';
  }
}

Usage Pattern

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
const paymentCircuit = new CircuitBreaker({
  failureThreshold: 5,
  resetTimeout: 30000,
});

async function processPayment(order) {
  try {
    return await paymentCircuit.call(async () => {
      return await paymentService.charge(order);
    });
  } catch (error) {
    if (error instanceof CircuitOpenError) {
      // Fast failure - payment service is known to be down
      return { status: 'pending', message: 'Payment service temporarily unavailable' };
    }
    throw error;
  }
}

Sliding Window for Better Detection

Counting total failures can be misleading. Use a sliding window instead:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
class SlidingWindowCircuitBreaker {
  constructor(options = {}) {
    this.windowSize = options.windowSize || 10;
    this.failureRateThreshold = options.failureRateThreshold || 0.5;
    this.resetTimeout = options.resetTimeout || 30000;
    
    this.results = []; // Ring buffer of recent results
    this.state = 'CLOSED';
    this.lastFailureTime = null;
  }

  recordResult(success) {
    this.results.push({ success, timestamp: Date.now() });
    
    // Keep only recent results
    if (this.results.length > this.windowSize) {
      this.results.shift();
    }
    
    // Only evaluate when we have enough data
    if (this.results.length >= this.windowSize) {
      const failures = this.results.filter(r => !r.success).length;
      const failureRate = failures / this.results.length;
      
      if (failureRate >= this.failureRateThreshold) {
        this.state = 'OPEN';
        this.lastFailureTime = Date.now();
      }
    }
  }

  getStats() {
    const failures = this.results.filter(r => !r.success).length;
    return {
      state: this.state,
      totalCalls: this.results.length,
      failures,
      failureRate: this.results.length > 0 ? failures / this.results.length : 0,
    };
  }
}

Per-Operation Circuit Breakers

Different operations may have different failure characteristics:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
const circuits = {
  'user-service:getUser': new CircuitBreaker({ failureThreshold: 10 }),
  'user-service:updateUser': new CircuitBreaker({ failureThreshold: 5 }),
  'payment-service:charge': new CircuitBreaker({ failureThreshold: 3 }),
  'email-service:send': new CircuitBreaker({ failureThreshold: 20 }),
};

function getCircuit(service, operation) {
  const key = `${service}:${operation}`;
  if (!circuits[key]) {
    circuits[key] = new CircuitBreaker(); // Default settings
  }
  return circuits[key];
}

Integration with Retries

Circuit breakers and retries work together:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
async function callWithResilience(fn, options = {}) {
  const circuit = options.circuit;
  const maxRetries = options.maxRetries || 3;
  const backoff = options.backoff || (attempt => Math.pow(2, attempt) * 100);
  
  return circuit.call(async () => {
    let lastError;
    
    for (let attempt = 0; attempt < maxRetries; attempt++) {
      try {
        return await fn();
      } catch (error) {
        lastError = error;
        
        // Don't retry on circuit open
        if (error instanceof CircuitOpenError) throw error;
        
        // Don't retry on client errors (4xx)
        if (error.statusCode >= 400 && error.statusCode < 500) throw error;
        
        if (attempt < maxRetries - 1) {
          await sleep(backoff(attempt));
        }
      }
    }
    
    throw lastError;
  });
}

Health Check Integration

Expose circuit breaker state in health checks:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
app.get('/health', (req, res) => {
  const circuitStates = Object.entries(circuits).map(([name, circuit]) => ({
    name,
    ...circuit.getStats(),
  }));
  
  const anyOpen = circuitStates.some(c => c.state === 'OPEN');
  
  res.status(anyOpen ? 503 : 200).json({
    status: anyOpen ? 'degraded' : 'healthy',
    circuits: circuitStates,
  });
});

Fallback Strategies

When the circuit is open, you have options:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
async function getProduct(id) {
  try {
    return await productCircuit.call(() => productService.get(id));
  } catch (error) {
    if (error instanceof CircuitOpenError) {
      // Strategy 1: Return cached data
      const cached = await cache.get(`product:${id}`);
      if (cached) return { ...cached, _fromCache: true };
      
      // Strategy 2: Return default/degraded response
      return { id, name: 'Product', available: false, _degraded: true };
      
      // Strategy 3: Queue for later
      await queue.add('fetch-product', { id });
      return { id, status: 'pending' };
    }
    throw error;
  }
}

Monitoring and Alerting

Track circuit breaker events:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
circuit.on('open', (stats) => {
  metrics.increment('circuit.opened', { circuit: 'payment' });
  logger.warn('Circuit opened', stats);
  alerting.send('Payment circuit breaker opened');
});

circuit.on('halfOpen', () => {
  metrics.increment('circuit.half_open', { circuit: 'payment' });
});

circuit.on('close', () => {
  metrics.increment('circuit.closed', { circuit: 'payment' });
  logger.info('Circuit recovered');
});

Libraries Worth Using

Don’t reinvent the wheel for production:

Key Takeaways

  1. Fail fast — Don’t waste resources on requests that will fail
  2. Give services time to recover — The open state prevents overwhelming a struggling service
  3. Test recovery gradually — Half-open state probes before fully resuming traffic
  4. Monitor everything — Circuit state changes are important signals
  5. Have fallbacks — Know what to do when the circuit opens

Circuit breakers turn cascade failures into graceful degradation. When something breaks, the blast radius stays contained.


Pairs well with Retry Patterns and Exponential Backoff for comprehensive resilience.