Every production system has dependencies. APIs, databases, caches, third-party services. Each one can fail. The question isn’t if they’ll fail, but how your system behaves when they do.

Graceful degradation means your system continues providing value—reduced, maybe, but value—when dependencies are unavailable. The opposite is cascade failure: one service dies, and everything dies with it.

Here are the patterns that make the difference.

The Hierarchy of Degradation

Not all degradation is equal. Design for multiple levels:

Level 0: Full functionality Everything works. This is your baseline.

Level 1: Reduced freshness Data is stale but present. Cache hit instead of live query.

Level 2: Reduced features Some features unavailable. Core functionality preserved.

Level 3: Static fallback Pre-computed or cached responses. Read-only mode.

Level 4: Informative failure Can’t serve the request, but explain why clearly.

Most systems jump from Level 0 to Level 4. The goal is to have meaningful stops in between.

Pattern 1: Circuit Breakers

Stop calling a failing service. Fail fast instead of timing out.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
class CircuitBreaker:
    def __init__(self, failure_threshold=5, reset_timeout=60):
        self.failures = 0
        self.threshold = failure_threshold
        self.reset_timeout = reset_timeout
        self.state = "closed"  # closed, open, half-open
        self.last_failure_time = None
    
    def call(self, func, *args, **kwargs):
        if self.state == "open":
            if time.time() - self.last_failure_time > self.reset_timeout:
                self.state = "half-open"
            else:
                raise CircuitOpenError("Service unavailable")
        
        try:
            result = func(*args, **kwargs)
            if self.state == "half-open":
                self.state = "closed"
                self.failures = 0
            return result
        except Exception as e:
            self.failures += 1
            self.last_failure_time = time.time()
            if self.failures >= self.threshold:
                self.state = "open"
            raise

When open: Return cached data, default values, or skip the feature entirely.

Pattern 2: Fallback Chains

Define a sequence of increasingly degraded responses.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
async def get_user_recommendations(user_id):
    # Level 0: Personalized real-time recommendations
    try:
        return await recommendation_service.get_personalized(user_id)
    except ServiceUnavailable:
        pass
    
    # Level 1: Cached personalized recommendations
    cached = await cache.get(f"recs:{user_id}")
    if cached:
        return cached.with_stale_indicator()
    
    # Level 2: Segment-based recommendations
    try:
        segment = await user_service.get_segment(user_id)
        return await recommendation_service.get_for_segment(segment)
    except ServiceUnavailable:
        pass
    
    # Level 3: Global popular items
    popular = await cache.get("recs:global:popular")
    if popular:
        return popular.with_generic_indicator()
    
    # Level 4: Static fallback
    return STATIC_FALLBACK_RECOMMENDATIONS

Each level is explicitly designed. You know what users get at each degradation point.

Pattern 3: Bulkheads

Isolate failures so they don’t spread.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# Thread pool per dependency
user_service_pool = ThreadPoolExecutor(max_workers=10)
payment_service_pool = ThreadPoolExecutor(max_workers=5)
notification_service_pool = ThreadPoolExecutor(max_workers=3)

async def process_order(order):
    # Payment pool exhausted? Users can still browse
    # Notifications slow? Payments still work
    user = await run_in_pool(user_service_pool, get_user, order.user_id)
    payment = await run_in_pool(payment_service_pool, charge, order)
    
    # Notification is fire-and-forget with timeout
    try:
        await asyncio.wait_for(
            run_in_pool(notification_service_pool, notify, user, order),
            timeout=2.0
        )
    except asyncio.TimeoutError:
        queue_for_retry(notify, user, order)

Bulkheads prevent one slow dependency from consuming all resources.

Pattern 4: Read-Your-Writes Consistency with Fallback

When the primary database is down, read from replicas—but track what the user just wrote.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
class ConsistentReader:
    def __init__(self, primary, replicas):
        self.primary = primary
        self.replicas = replicas
        self.recent_writes = TTLCache(maxsize=10000, ttl=30)
    
    async def read(self, key, user_id):
        # If user just wrote this key, they must see their write
        write_record = self.recent_writes.get((user_id, key))
        if write_record:
            return write_record.value
        
        # Try primary
        try:
            return await self.primary.read(key)
        except Unavailable:
            pass
        
        # Fall back to replica (might be stale, but user hasn't written)
        return await self.replicas.read(key)
    
    async def write(self, key, value, user_id):
        result = await self.primary.write(key, value)
        self.recent_writes[(user_id, key)] = WriteRecord(value, time.time())
        return result

Users see their own writes; stale reads only affect data they haven’t touched.

Pattern 5: Feature Flags as Degradation Controls

Use feature flags to manually (or automatically) degrade specific features.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
async def handle_search(query):
    if not feature_flags.get("search_enabled"):
        return SearchResult.unavailable("Search temporarily disabled")
    
    if not feature_flags.get("search_ai_ranking"):
        # Fall back to simpler ranking
        return await simple_search(query)
    
    if not feature_flags.get("search_spell_check"):
        # Skip spell check, use query as-is
        return await ai_search(query, spell_check=False)
    
    return await ai_search(query)

When load spikes or a dependency struggles, disable expensive features first.

Pattern 6: Timeout Budgets

Allocate a total timeout, then distribute it across operations.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
async def composite_request(request, total_budget=5.0):
    budget = TimeBudget(total_budget)
    
    # Critical: must complete. Gets 40% of budget.
    user = await budget.run(
        get_user(request.user_id),
        allocation=0.4,
        required=True
    )
    
    # Important: should complete. Gets 30% of budget.
    try:
        recommendations = await budget.run(
            get_recommendations(user),
            allocation=0.3,
            required=False
        )
    except BudgetExceeded:
        recommendations = FALLBACK_RECOMMENDATIONS
    
    # Nice-to-have: if time remains. Gets remaining budget.
    try:
        notifications = await budget.run(
            get_notifications(user),
            allocation=budget.remaining(),
            required=False
        )
    except BudgetExceeded:
        notifications = None
    
    return Response(user, recommendations, notifications)

The budget enforces that one slow dependency can’t starve others.

Pattern 7: Partial Success Responses

Return what you have, clearly marking what’s missing.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
@dataclass
class PartialResponse:
    data: dict
    missing: list[str]
    degraded: list[str]
    
    def to_json(self):
        return {
            "data": self.data,
            "_meta": {
                "partial": bool(self.missing or self.degraded),
                "missing": self.missing,
                "degraded": self.degraded
            }
        }

async def get_dashboard():
    response = PartialResponse(data={}, missing=[], degraded=[])
    
    try:
        response.data["metrics"] = await get_metrics()
    except Timeout:
        response.data["metrics"] = await get_cached_metrics()
        response.degraded.append("metrics")
    except Unavailable:
        response.missing.append("metrics")
    
    try:
        response.data["alerts"] = await get_alerts()
    except Unavailable:
        response.missing.append("alerts")
    
    return response

Clients can decide how to handle partial data instead of getting nothing.

Anti-Pattern: Silent Degradation

The worst kind of degradation is invisible degradation.

1
2
3
4
5
6
# BAD: Silent fallback
def get_user(user_id):
    try:
        return user_service.get(user_id)
    except:
        return DEFAULT_USER  # 😱 Who knows this happened?

Always log, metric, or mark degraded responses:

1
2
3
4
5
6
7
8
# GOOD: Observable degradation
def get_user(user_id):
    try:
        return user_service.get(user_id)
    except Exception as e:
        metrics.increment("user_service.fallback")
        logger.warning(f"User service unavailable, using fallback: {e}")
        return FallbackUser(user_id, degraded=True)

Testing Degradation

You can’t trust patterns you haven’t tested.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
@pytest.fixture
def chaos_mode():
    """Randomly fail dependencies during tests."""
    original_call = httpx.Client.request
    
    def chaotic_request(self, *args, **kwargs):
        if random.random() < 0.3:  # 30% failure rate
            raise httpx.ConnectError("Chaos!")
        return original_call(self, *args, **kwargs)
    
    httpx.Client.request = chaotic_request
    yield
    httpx.Client.request = original_call

def test_dashboard_under_chaos(chaos_mode):
    """Dashboard should return partial data, not crash."""
    response = client.get("/dashboard")
    assert response.status_code == 200
    assert "_meta" in response.json()  # Should indicate partial

Run chaos tests in staging. Find out how your system degrades before production does.

The Mindset Shift

Reliability isn’t about preventing failure. It’s about making failure cheap.

Every dependency call should have an answer to: “What happens when this times out?” If the answer is “the whole request fails,” you have work to do.

Design the degraded states first. Make them explicit. Test them. Then hope you rarely need them—but know they’re there when you do.