Most production incidents I’ve debugged came down to configuration. A missing environment variable. A wrong database URL. A feature flag stuck in the wrong state. Code was fine; configuration was the problem.

Configuration management is the unsexy work that prevents those 3 AM pages.

The Core Principles

1. Separate Configuration from Code

Configuration should never be baked into your application binary or container image.

Wrong:

1
2
# Hardcoded in code
DATABASE_URL = "postgres://prod:password@db.example.com/myapp"

Also wrong:

1
2
# Baked into image
ENV DATABASE_URL="postgres://prod:password@db.example.com/myapp"

Right:

1
2
# Read at runtime
DATABASE_URL = os.environ.get("DATABASE_URL")
1
2
# Injected at deployment
docker run -e DATABASE_URL="..." myapp:1.0

Why? The same image should run in dev, staging, and production. Only configuration differs.

2. Validate Configuration at Startup

Fail fast if configuration is missing or invalid:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
from pydantic import BaseSettings, validator

class Settings(BaseSettings):
    database_url: str
    redis_url: str
    api_key: str
    debug: bool = False
    max_connections: int = 10
    
    @validator("max_connections")
    def validate_connections(cls, v):
        if v < 1 or v > 100:
            raise ValueError("max_connections must be between 1 and 100")
        return v
    
    class Config:
        env_file = ".env"

# This runs at import time - app won't start with bad config
settings = Settings()

A clear error at startup beats a cryptic error at 3 AM when that code path finally runs.

3. Make Configuration Explicit

Document every configuration option:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# config.example.yaml
# Database connection string (required)
# Format: postgres://user:password@host:port/database
database_url: ""

# Redis URL for caching (required)
redis_url: ""

# Enable debug mode (optional, default: false)
# WARNING: Never enable in production
debug: false

# Maximum concurrent database connections (optional, default: 10)
# Range: 1-100
max_connections: 10

If it’s not documented, someone will misconfigure it.

4. Use Hierarchical Configuration

Configuration should layer: defaults → environment-specific → instance-specific → overrides.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import os
from pathlib import Path
import yaml

def load_config():
    config = {}
    
    # 1. Load defaults
    config.update(load_yaml("config/defaults.yaml"))
    
    # 2. Load environment-specific
    env = os.getenv("APP_ENV", "development")
    env_file = Path(f"config/{env}.yaml")
    if env_file.exists():
        config.update(load_yaml(env_file))
    
    # 3. Load local overrides (gitignored)
    local_file = Path("config/local.yaml")
    if local_file.exists():
        config.update(load_yaml(local_file))
    
    # 4. Environment variables override everything
    for key in config:
        env_key = f"APP_{key.upper()}"
        if env_key in os.environ:
            config[key] = os.environ[env_key]
    
    return config

This lets you have sensible defaults while allowing specific overrides.

5. Version Your Configuration

Configuration changes should be tracked like code changes:

confidpslgerto/foacadgauuillcn.ttgysi.a.oymynala.mmyllaml####CCCGhhhieeetcccikkkgeeendddoriiiennndttt,ooopgggeiiirttt-d(envoelsoepcerrets!)

For secrets, use a separate system (Vault, AWS Secrets Manager) and reference them:

1
2
# production.yaml
database_password: "${vault:secret/prod/db#password}"

Common Patterns

Environment Variables

The twelve-factor standard. Simple, universal, works everywhere.

1
2
3
export DATABASE_URL="postgres://..."
export REDIS_URL="redis://..."
export LOG_LEVEL="info"

Pros:

  • Works in any language
  • Easy to inject in containers, CI, etc.
  • No files to manage

Cons:

  • No structure (everything is a string)
  • Hard to see “all configuration” at once
  • Can leak in logs, process listings

Best for: Simple applications, container deployments, cloud-native environments.

Configuration Files

YAML, JSON, TOML, INI — structured configuration in files.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# config.yaml
server:
  host: 0.0.0.0
  port: 8080
  workers: 4

database:
  url: postgres://localhost/myapp
  pool_size: 10

features:
  new_checkout: true
  beta_users: ["user1", "user2"]

Pros:

  • Structured, supports nesting
  • Easy to read and edit
  • Can be templated

Cons:

  • Files need to be deployed
  • Requires parsing logic
  • Format choice can be contentious (YAML vs TOML debates)

Best for: Complex configuration, local development, applications with many options.

Remote Configuration

Fetch configuration from a central service (Consul, etcd, AWS Parameter Store).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
import boto3

ssm = boto3.client('ssm')

def get_config():
    response = ssm.get_parameters_by_path(
        Path="/myapp/production/",
        WithDecryption=True
    )
    
    config = {}
    for param in response["Parameters"]:
        key = param["Name"].split("/")[-1]
        config[key] = param["Value"]
    
    return config

Pros:

  • Centralized management
  • Can update without redeployment
  • Built-in encryption for secrets

Cons:

  • Adds external dependency
  • Network latency at startup
  • Needs caching strategy

Best for: Microservices, multi-environment deployments, dynamic configuration.

Feature Flags

A special case of configuration: controlling feature rollout.

1
2
3
4
5
6
7
8
9
# Simple approach
if config.features.get("new_checkout"):
    return new_checkout_flow()
else:
    return old_checkout_flow()

# With targeting
if feature_client.is_enabled("new_checkout", user_id=user.id):
    return new_checkout_flow()

Feature flag systems (LaunchDarkly, Unleash, Flagsmith) add:

  • Percentage rollouts (10% of users see new feature)
  • User targeting (beta users, specific accounts)
  • A/B testing (track outcomes per variant)
  • Kill switches (disable instantly without deploy)

Rule: Feature flags are temporary. Remove them after rollout is complete. Stale flags become technical debt.

Anti-Patterns

Configuration Drift

Environment configurations diverge over time:

  • Staging has different settings than production
  • One server has an override someone added manually
  • Nobody knows the “canonical” configuration

Fix: Infrastructure as Code. Terraform, Ansible, or Kubernetes manifests define the configuration. Drift is detected and corrected automatically.

Secret Sprawl

Secrets copied into:

  • Environment files
  • CI/CD configurations
  • Developer laptops
  • Slack messages

Fix: Centralized secret management. Vault, AWS Secrets Manager, or similar. Secrets are fetched at runtime, never stored in files or repositories.

The God Config

One massive configuration object passed everywhere:

1
2
3
4
def handle_request(config, request):
    db = connect(config.database_url)
    cache = connect(config.redis_url)
    # config has 200 fields, this function uses 3

Fix: Inject only what’s needed. Use dependency injection or specific configuration objects:

1
2
def handle_request(db: Database, cache: Cache, request):
    # Function declares its actual dependencies

Configuration as Logic

1
2
3
4
5
6
# Don't do this
rules:
  - condition: "user.age > 18 AND user.country == 'US'"
    action: "allow"
  - condition: "user.subscription == 'premium'"
    action: "allow"

You’ve invented a programming language in YAML. This is hard to test, hard to debug, and hard to reason about.

Fix: Keep logic in code. Use configuration for simple values, not business rules.

Testing Configuration

Validate in CI

1
2
3
4
5
# .github/workflows/ci.yml
- name: Validate configuration
  run: |
    python -c "from config import Settings; Settings()"
    yamllint config/*.yaml

Catch typos and missing values before deployment.

Environment Parity

Test with production-like configuration:

1
2
3
4
5
6
7
8
def test_with_production_config():
    # Load production config (with secrets stubbed)
    config = load_config("production", secrets=mock_secrets)
    
    app = create_app(config)
    
    # Run integration tests
    ...

If tests pass with different configuration than production, they’re lying to you.

Configuration Diff on Deploy

1
2
3
4
5
6
7
8
9
# Show what's changing
diff <(kubectl get configmap myapp -o yaml) new-configmap.yaml

# Require approval for production config changes
if [ "$ENV" = "production" ]; then
    echo "Configuration changes:"
    diff ...
    read -p "Apply? [y/N] " confirm
fi

Make configuration changes visible and intentional.


Configuration is the connective tissue between your code and the environment it runs in. Treat it with the same care you’d give to code: version it, validate it, test it, and document it. Your future self — the one who’s not debugging at 3 AM — will thank you.