Feature flags started as if (ENABLE_NEW_UI) { ... }. They’ve evolved into a deployment strategy that separates code deployment from feature release. Ship your code Tuesday. Release to 1% of users Wednesday. Roll back without deploying Thursday.

Here’s how to implement feature flags that scale from simple toggles to sophisticated progressive delivery.

The Basic Pattern

At its core, a feature flag is a runtime conditional:

1
2
3
4
5
def get_recommendations(user_id: str) -> list:
    if feature_flags.is_enabled("new_recommendation_algo", user_id):
        return new_algorithm(user_id)
    else:
        return legacy_algorithm(user_id)

The magic is in how is_enabled works — and how you manage the flag lifecycle.

Flag Types

Not all flags are created equal. Different purposes require different behaviors.

Release Flags (Short-lived)

Gate incomplete features during development. Remove after rollout.

1
2
3
4
# Release flag - remove after new_checkout is fully deployed
if flags.is_enabled("new_checkout_flow"):
    return render_new_checkout(cart)
return render_legacy_checkout(cart)

Lifecycle: Created → Partial rollout → Full rollout → Remove from code

Experiment Flags (Measured)

A/B tests with analytics integration.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Experiment flag - measures impact on conversion
variant = flags.get_variant("pricing_page_experiment", user_id)
# Returns: "control", "variant_a", or "variant_b"

if variant == "variant_a":
    prices = apply_discount(prices, 0.10)
elif variant == "variant_b":
    prices = apply_bundling(prices)

# Track for analysis
analytics.track("pricing_page_view", {
    "user_id": user_id,
    "variant": variant
})

Lifecycle: Hypothesis → Experiment → Analyze → Pick winner → Remove

Ops Flags (Kill Switches)

Emergency controls for production incidents.

1
2
3
4
5
# Kill switch - disable expensive feature during outage
if flags.is_enabled("enable_search_suggestions"):
    suggestions = get_search_suggestions(query)  # Expensive
else:
    suggestions = []  # Graceful degradation

Lifecycle: Created → Lives forever (but rarely toggled)

Permission Flags (Entitlements)

Control feature access by user tier or permissions.

1
2
3
4
5
6
7
# Permission flag - premium feature
if flags.is_enabled("advanced_analytics", user_id, {
    "plan": user.plan,
    "organization": user.org_id
}):
    return render_advanced_analytics(user_id)
return render_upgrade_prompt()

Lifecycle: Created → Lives as long as the pricing model exists

Progressive Rollout Strategies

The real power of flags is controlled exposure.

Percentage Rollout

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
class PercentageRollout:
    def __init__(self, flag_name: str, percentage: int):
        self.flag_name = flag_name
        self.percentage = percentage
    
    def is_enabled(self, user_id: str) -> bool:
        # Deterministic hash ensures same user always gets same result
        hash_value = int(hashlib.md5(
            f"{self.flag_name}:{user_id}".encode()
        ).hexdigest(), 16)
        
        bucket = hash_value % 100
        return bucket < self.percentage

Rollout schedule:

  • Day 1: 1% (catch catastrophic bugs)
  • Day 2: 5% (validate metrics)
  • Day 3: 25% (stress test)
  • Day 4: 50% (confidence building)
  • Day 5: 100% (full release)

Sticky Assignment

Users should see consistent behavior. Don’t flip features mid-session.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
class StickyFlags:
    def __init__(self, cache: Redis):
        self.cache = cache
    
    def get_assignment(self, flag: str, user_id: str) -> bool:
        key = f"flag:{flag}:{user_id}"
        
        # Check for existing assignment
        cached = self.cache.get(key)
        if cached is not None:
            return cached == "1"
        
        # Calculate new assignment
        enabled = self._calculate_enabled(flag, user_id)
        
        # Store with TTL matching experiment duration
        self.cache.setex(key, 86400 * 30, "1" if enabled else "0")
        
        return enabled

Ring-Based Deployment

Deploy to progressively larger “rings” of users:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
RINGS = {
    "canary": ["internal_qa@company.com", "dogfood@company.com"],
    "early_adopters": lambda u: u.opted_into_beta,
    "general": lambda u: True
}

def get_active_ring(flag_name: str) -> str:
    # Controlled by flag service
    return flag_config[flag_name].get("active_ring", "canary")

def is_enabled(flag_name: str, user: User) -> bool:
    active_ring = get_active_ring(flag_name)
    
    for ring_name, ring_check in RINGS.items():
        if isinstance(ring_check, list):
            if user.email in ring_check:
                return True
        elif ring_check(user):
            return True
        
        if ring_name == active_ring:
            break
    
    return False

Targeting Rules

Complex conditions for sophisticated rollouts:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# LaunchDarkly-style targeting rules
flag_config = {
    "new_dashboard": {
        "rules": [
            {
                # Always enable for internal users
                "if": {"attribute": "email", "endsWith": "@company.com"},
                "serve": True
            },
            {
                # Enable for enterprise tier in US
                "if": {
                    "and": [
                        {"attribute": "plan", "equals": "enterprise"},
                        {"attribute": "country", "in": ["US", "CA"]}
                    ]
                },
                "serve": True
            },
            {
                # 20% of remaining users
                "percentage": 20,
                "serve": True
            }
        ],
        "default": False
    }
}

Implementation Patterns

SDK Structure

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
from dataclasses import dataclass
from typing import Any, Dict, Optional
import hashlib
import time

@dataclass
class FlagEvaluation:
    enabled: bool
    variant: Optional[str]
    reason: str  # "rule_match", "percentage", "default"
    timestamp: float

class FeatureFlags:
    def __init__(self, config_source, cache=None):
        self.config = config_source
        self.cache = cache
        self._local_cache = {}
        self._cache_ttl = 60  # Refresh config every minute
    
    def is_enabled(
        self, 
        flag_name: str, 
        user_id: str = None,
        context: Dict[str, Any] = None
    ) -> bool:
        evaluation = self.evaluate(flag_name, user_id, context)
        return evaluation.enabled
    
    def evaluate(
        self,
        flag_name: str,
        user_id: str = None,
        context: Dict[str, Any] = None
    ) -> FlagEvaluation:
        config = self._get_flag_config(flag_name)
        
        if config is None:
            return FlagEvaluation(False, None, "not_found", time.time())
        
        # Check kill switch
        if config.get("killed"):
            return FlagEvaluation(False, None, "killed", time.time())
        
        # Evaluate targeting rules
        context = context or {}
        context["user_id"] = user_id
        
        for rule in config.get("rules", []):
            if self._matches_rule(rule, context):
                return FlagEvaluation(
                    rule["serve"], 
                    rule.get("variant"),
                    "rule_match",
                    time.time()
                )
        
        # Percentage rollout
        if "percentage" in config and user_id:
            if self._in_percentage(flag_name, user_id, config["percentage"]):
                return FlagEvaluation(True, None, "percentage", time.time())
        
        # Default
        return FlagEvaluation(
            config.get("default", False),
            None,
            "default",
            time.time()
        )
    
    def _in_percentage(self, flag: str, user_id: str, pct: int) -> bool:
        hash_input = f"{flag}:{user_id}"
        hash_value = int(hashlib.sha256(hash_input.encode()).hexdigest(), 16)
        return (hash_value % 100) < pct

Avoiding Flag Debt

Flags accumulate. Old flags become code archaeology.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# Flag metadata for lifecycle management
FLAG_REGISTRY = {
    "new_checkout": {
        "owner": "payments-team",
        "created": "2024-01-15",
        "expected_removal": "2024-02-15",
        "jira": "PAY-1234",
        "type": "release"
    }
}

def audit_flags():
    """Run weekly to catch stale flags"""
    stale = []
    for flag, meta in FLAG_REGISTRY.items():
        if meta["type"] == "release":
            expected = datetime.fromisoformat(meta["expected_removal"])
            if datetime.now() > expected:
                stale.append({
                    "flag": flag,
                    "owner": meta["owner"],
                    "overdue_days": (datetime.now() - expected).days
                })
    
    if stale:
        send_slack_alert(f"Stale flags need cleanup: {stale}")

Testing with Flags

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import pytest
from unittest.mock import patch

class TestCheckoutFlow:
    def test_new_checkout_enabled(self, client):
        with patch.object(feature_flags, 'is_enabled', return_value=True):
            response = client.post('/checkout', json=cart_data)
            assert response.json()["version"] == "v2"
    
    def test_new_checkout_disabled(self, client):
        with patch.object(feature_flags, 'is_enabled', return_value=False):
            response = client.post('/checkout', json=cart_data)
            assert response.json()["version"] == "v1"
    
    def test_both_paths_produce_same_result(self, client, cart_data):
        """Verify feature parity before removing old path"""
        with patch.object(feature_flags, 'is_enabled', return_value=True):
            new_result = client.post('/checkout', json=cart_data).json()
        
        with patch.object(feature_flags, 'is_enabled', return_value=False):
            old_result = client.post('/checkout', json=cart_data).json()
        
        assert new_result["total"] == old_result["total"]
        assert new_result["items"] == old_result["items"]

Observability

Flags without metrics are flying blind.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
from prometheus_client import Counter, Histogram

flag_evaluations = Counter(
    'feature_flag_evaluations_total',
    'Feature flag evaluations',
    ['flag_name', 'enabled', 'reason']
)

flag_evaluation_duration = Histogram(
    'feature_flag_evaluation_seconds',
    'Time to evaluate feature flag',
    ['flag_name']
)

def evaluate_with_metrics(flag_name: str, user_id: str) -> bool:
    with flag_evaluation_duration.labels(flag_name).time():
        result = flags.evaluate(flag_name, user_id)
    
    flag_evaluations.labels(
        flag_name=flag_name,
        enabled=str(result.enabled),
        reason=result.reason
    ).inc()
    
    return result.enabled

Dashboard queries:

1
2
3
4
5
6
7
8
9
# Flag adoption over time
sum(rate(feature_flag_evaluations_total{flag_name="new_checkout", enabled="true"}[5m])) 
/ 
sum(rate(feature_flag_evaluations_total{flag_name="new_checkout"}[5m]))

# Flags with high evaluation latency
histogram_quantile(0.99, 
  rate(feature_flag_evaluation_seconds_bucket[5m])
) > 0.01

The Managed Services

For production use, consider dedicated services:

ServiceStrengthsPricing
LaunchDarklyFull-featured, great SDKs$$$
Split.ioStrong experimentation$$
FlagsmithOpen-source option$
UnleashSelf-hosted, open-sourceFree
AWS AppConfigAWS-native, simple$

Roll your own only if you have specific requirements the services don’t meet.

The Lifecycle

  1. Create — Define flag with owner, expected duration, success metrics
  2. Implement — Add flag check in code, test both paths
  3. Roll out — Progressive percentage increase with monitoring
  4. Measure — Analyze metrics, compare variants
  5. Decide — Full enable or revert
  6. Clean up — Remove flag from code, delete configuration

The cleanup step is where most teams fail. A flag at 100% for 6 months isn’t a flag — it’s dead code with extra steps.

Feature flags turn deployments from “hope it works” into “let’s find out safely.” The investment in flag infrastructure pays back every time you catch a bug at 1% instead of 100%.

Ship code fearlessly. Release features carefully.