Feature Flags: Ship Fast Without Breaking Things

Feature flags turn deployment into a two-step process: ship the code, then enable the feature. This separation is powerful when done right and a maintenance nightmare when done wrong.

The Core Value Proposition

Without feature flags, deployment equals release. Ship broken code? Users see it immediately. Need to roll back? Redeploy the previous version. Want to test with 1% of users? Build custom infrastructure.

With feature flags, you decouple these concerns:

Deploy: Code goes to production, but new features are off
Release: Flip a flag to enable for some or all users
Rollback: Flip the flag back off (no deployment needed)

The mental shift: treat deployment as logistics and release as a business decision.

Implementation Patterns

Simple Boolean Flags

Start here. A flag is either on or off:

1
2
3
4
5
6
7
from featureflags import get_flag

def checkout():
    if get_flag("new_payment_flow"):
        return new_checkout_flow()
    else:
        return legacy_checkout()

User-Targeted Flags

Enable for specific users or groups:

1
2
3
4
5
def checkout(user):
    if get_flag("new_payment_flow", user_id=user.id):
        return new_checkout_flow()
    else:
        return legacy_checkout()

The flag service evaluates rules: is this user in the beta group? Are they an employee? Did they opt in?

Percentage Rollouts

Gradually increase exposure:

1
2
3
4
5
6
7
8
# In your flag configuration:
{
    "new_payment_flow": {
        "type": "percentage",
        "value": 10,  # 10% of users
        "sticky": true  # Same users always get same variant
    }
}

Sticky evaluation matters. You don’t want users flipping between old and new randomly.

Kill Switches

The underrated use case. Every critical dependency should have a kill switch:

1
2
3
4
5
6
def send_notification(user, message):
    if get_flag("notifications_enabled"):
        return notification_service.send(user, message)
    else:
        log.warning("Notifications disabled, queuing for later")
        return queue_for_later(user, message)

When your notification provider has an outage, flip the flag and gracefully degrade.

Architecture

Client-Side vs Server-Side

Server-side evaluation:

Flag rules stay on your servers
User gets final decision, not the rules
More secure, less flexible for instant updates

Client-side evaluation:

Rules downloaded to client
Faster evaluation, no network round-trip
Rules are visible to users (security consideration)

Most teams use server-side for backend and client-side for frontend, with different security postures for each.

Flag Storage

Options from simple to complex:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# 1. Environment variables (simplest, requires redeploy to change)
NEW_FEATURE = os.getenv("FEATURE_NEW_CHECKOUT", "false") == "true"

# 2. Config file (simple, requires restart or hot-reload)
flags = yaml.load(open("flags.yaml"))

# 3. Database (flexible, adds latency)
flag = db.query("SELECT enabled FROM flags WHERE name = ?", flag_name)

# 4. Dedicated service (LaunchDarkly, Unleash, Flagsmith)
flag = feature_service.evaluate("new_checkout", user_context)

For serious use, option 4 wins. The tooling around flag management, audit logs, and percentage rollouts is worth the cost.

Caching Strategy

Flag evaluation happens constantly. Cache aggressively:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
class FlagCache:
    def __init__(self, ttl_seconds=30):
        self.cache = {}
        self.ttl = ttl_seconds
    
    def get(self, flag_name, context):
        key = f"{flag_name}:{hash(context)}"
        if key in self.cache:
            value, expires = self.cache[key]
            if time.time() < expires:
                return value
        
        value = self.fetch_from_service(flag_name, context)
        self.cache[key] = (value, time.time() + self.ttl)
        return value

30-second TTL is usually fine. Flag changes aren’t instant, but they’re fast enough.

The Flag Lifecycle

Flags are not permanent. They have a lifecycle:

Create: New feature behind flag, default off
Test: Enable for internal users, QA
Rollout: Percentage increase: 1% → 10% → 50% → 100%
Stable: Flag at 100%, feature proven
Remove: Delete flag, remove conditional code

Step 5 is where most teams fail. Flags accumulate. Code becomes spaghetti. Nobody knows which flags are still needed.

Flag Hygiene

Enforce cleanup:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Bad: Flag with no expiration
if get_flag("new_checkout"):
    ...

# Better: Flag with documented purpose and deadline
# FLAG: new_checkout
# Purpose: A/B test new payment flow
# Owner: payments-team
# Remove by: 2026-04-15
if get_flag("new_checkout"):
    ...

Even better: build tooling that alerts when flags are past their removal date.

1
2
3
4
5
6
7
8
9
#!/bin/bash
# find-stale-flags.sh

grep -r "Remove by:" --include="*.py" | while read line; do
    date_str=$(echo "$line" | grep -oP '\d{4}-\d{2}-\d{2}')
    if [[ $(date -d "$date_str" +%s) -lt $(date +%s) ]]; then
        echo "STALE FLAG: $line"
    fi
done

Testing with Flags

Flags complicate testing. You now have 2^n possible states where n is your flag count.

Test Both Paths

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
@pytest.mark.parametrize("flag_value", [True, False])
def test_checkout(flag_value, mocker):
    mocker.patch('featureflags.get_flag', return_value=flag_value)
    
    result = checkout()
    
    if flag_value:
        assert result.flow == "new"
    else:
        assert result.flow == "legacy"

Integration Test the Rollout

1
2
3
4
5
6
7
def test_percentage_rollout_is_sticky():
    user = create_test_user()
    
    # Same user should get same result every time
    results = [get_flag("new_checkout", user_id=user.id) for _ in range(100)]
    
    assert len(set(results)) == 1  # All same value

Common Mistakes

Flag Coupling

1
2
3
4
5
6
# Bad: Flags that depend on each other
if get_flag("new_ui") and get_flag("new_api"):
    use_new_everything()
elif get_flag("new_ui"):
    # Broken state: new UI expects new API
    use_new_ui_with_old_api()  # 💥

Either make flags independent or create a single flag for the coupled feature.

Flag Sprawl

1
2
3
4
5
# Bad: Hundreds of flags, nobody knows what's active
if get_flag("checkout_v2"):
    if get_flag("checkout_v2_button_color"):
        if get_flag("checkout_v2_button_color_summer"):
            # Three levels deep, good luck debugging

Rule of thumb: if you have more than 20 active flags, you have a process problem.

Permanent Flags

1
2
3
# Bad: "Temporary" flag from 2024
if get_flag("temp_fix_invoice_bug"):
    apply_invoice_workaround()

If it’s been “temporary” for six months, it’s permanent. Remove the flag, keep the code.

Observability

You need to know which flags are evaluated and what values they return:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
def get_flag_with_logging(name, **context):
    value = get_flag(name, **context)
    
    metrics.increment(
        "feature_flag.evaluation",
        tags={
            "flag": name,
            "value": str(value),
            "user_segment": context.get("segment", "unknown")
        }
    )
    
    return value

This lets you:

See adoption curves during rollout
Debug “why did this user see the old flow?”
Catch flags that are never evaluated (dead code)

When to Use Flags

Good use cases:

Gradual rollouts of risky changes
A/B testing user-facing features
Kill switches for external dependencies
Enabling features for beta users
Regional or customer-specific features

Probably overkill:

Internal tooling changes
Refactors with no user-visible impact
Bug fixes (just ship them)
Features that can’t be partially enabled

The Payoff

Done right, feature flags give you:

Confidence: Ship to production knowing you can turn it off
Speed: Merge to main without waiting for “release windows”
Control: Enable for 1% of users and watch metrics before going wide
Recovery: Rollback in seconds, not minutes

The overhead is real: more conditionals, more testing, more cleanup. But for any team shipping frequently to production, the risk reduction is worth it. 🌍

The Core Value Proposition#

Implementation Patterns#

Simple Boolean Flags#

User-Targeted Flags#

Percentage Rollouts#

Kill Switches#

Architecture#

Client-Side vs Server-Side#

Flag Storage#

Caching Strategy#

The Flag Lifecycle#

Flag Hygiene#

Testing with Flags#

Test Both Paths#

Integration Test the Rollout#

Common Mistakes#

Flag Coupling#

Flag Sprawl#

Permanent Flags#

Observability#

When to Use Flags#

The Payoff#

📬 Get the Newsletter