Configuration is where deployments go to die. A typo in an environment variable, a missing secret, a config file that works in staging but breaks in production. Here’s how to make configuration boring and reliable.

The Hierarchy of Configuration

Not all config is created equal. Layer it:

12345.....DCECFeonoefnvmaafimtuiraulgonrtndesfm-ielf(lnilietnansegvsc(afoprl(deiareragu)bsneltneivsmier)onment)SODaRnyfEuenenn-asvtomtiifirmfcfoeanclmvhleveabnernatrrgc-rieksidsspdeeescsific

Later layers override earlier ones. This lets you:

  • Ship sensible defaults
  • Override per environment
  • Hotfix without redeploying
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import os
from dataclasses import dataclass, field

@dataclass
class Config:
    # Layer 1: Defaults
    database_pool_size: int = 10
    cache_ttl_seconds: int = 300
    debug_mode: bool = False
    
    @classmethod
    def load(cls):
        config = cls()
        
        # Layer 2: Config file
        config_file = os.getenv('CONFIG_FILE', 'config.yaml')
        if os.path.exists(config_file):
            file_config = yaml.safe_load(open(config_file))
            for key, value in file_config.items():
                if hasattr(config, key):
                    setattr(config, key, value)
        
        # Layer 3: Environment variables
        if pool := os.getenv('DATABASE_POOL_SIZE'):
            config.database_pool_size = int(pool)
        if ttl := os.getenv('CACHE_TTL_SECONDS'):
            config.cache_ttl_seconds = int(ttl)
        if debug := os.getenv('DEBUG_MODE'):
            config.debug_mode = debug.lower() in ('true', '1', 'yes')
        
        return config

Environment Variables: The Good Parts

Environment variables are ubiquitous because they work everywhere:

1
2
3
4
5
6
7
8
# Explicit is better than implicit
DATABASE_URL=postgres://user:pass@host:5432/db
REDIS_URL=redis://localhost:6379/0
LOG_LEVEL=INFO

# Prefix for namespacing
MYAPP_DATABASE_URL=...
MYAPP_CACHE_TTL=300

Best practices:

  1. Use a prefix for your app’s variables. DATABASE_URL might conflict; MYAPP_DATABASE_URL won’t.

  2. Document every variable. A .env.example file is minimum:

1
2
3
4
5
# .env.example - copy to .env and fill in values
DATABASE_URL=          # Required. PostgreSQL connection string
REDIS_URL=             # Optional. Defaults to redis://localhost:6379
SECRET_KEY=            # Required. 32+ character random string
DEBUG=false            # Optional. Enable debug mode
  1. Validate on startup:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
def validate_config(config: Config) -> list[str]:
    errors = []
    
    if not os.getenv('DATABASE_URL'):
        errors.append("DATABASE_URL is required")
    
    if not os.getenv('SECRET_KEY'):
        errors.append("SECRET_KEY is required")
    elif len(os.getenv('SECRET_KEY', '')) < 32:
        errors.append("SECRET_KEY must be at least 32 characters")
    
    return errors

# Fail fast on startup
errors = validate_config(config)
if errors:
    for error in errors:
        print(f"Configuration error: {error}", file=sys.stderr)
    sys.exit(1)

Secrets Are Not Config

Secrets need special handling:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# BAD: Secrets in config files (gets committed to git)
config = {
    "database_password": "hunter2"  # NO
}

# BAD: Secrets in environment variables (visible in process list)
$ ps aux | grep python
# Shows DATABASE_URL=postgres://user:password@...

# BETTER: Secrets from files
DATABASE_PASSWORD_FILE=/run/secrets/db_password

def get_secret(name: str) -> str:
    file_path = os.getenv(f'{name}_FILE')
    if file_path and os.path.exists(file_path):
        return open(file_path).read().strip()
    # Fallback to direct env var for development
    return os.getenv(name, '')

For production: Use a secrets manager (Vault, AWS Secrets Manager, etc.) with short-lived credentials.

Config Files: Structured and Validated

For complex config, files beat environment variables:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# config.yaml
server:
  host: 0.0.0.0
  port: 8080
  workers: 4

database:
  pool_size: 20
  timeout_seconds: 30

features:
  new_checkout: false
  dark_mode: true

Always validate:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
from pydantic import BaseModel, validator
from typing import Optional

class ServerConfig(BaseModel):
    host: str = "0.0.0.0"
    port: int = 8080
    workers: int = 4
    
    @validator('port')
    def port_must_be_valid(cls, v):
        if not (1 <= v <= 65535):
            raise ValueError('port must be 1-65535')
        return v
    
    @validator('workers')
    def workers_must_be_positive(cls, v):
        if v < 1:
            raise ValueError('workers must be at least 1')
        return v

class DatabaseConfig(BaseModel):
    pool_size: int = 10
    timeout_seconds: int = 30

class AppConfig(BaseModel):
    server: ServerConfig = ServerConfig()
    database: DatabaseConfig = DatabaseConfig()
    
# Pydantic validates on load
config = AppConfig(**yaml.safe_load(open('config.yaml')))

Per-Environment Config

Avoid if environment == 'production' scattered through your code:

confibdspgaetr/svaoeegd.liuyoncapgtmm.ileyonantm..lyyaammll####SDSPhetravaorgdeiudvncegtdrierofivnadeuerlsrvtiesdrersides
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import yaml
from pathlib import Path

def load_config(environment: str) -> dict:
    config_dir = Path('config')
    
    # Start with base
    config = yaml.safe_load((config_dir / 'base.yaml').read_text())
    
    # Merge environment-specific
    env_file = config_dir / f'{environment}.yaml'
    if env_file.exists():
        env_config = yaml.safe_load(env_file.read_text())
        deep_merge(config, env_config)
    
    return config

def deep_merge(base: dict, override: dict) -> dict:
    for key, value in override.items():
        if key in base and isinstance(base[key], dict) and isinstance(value, dict):
            deep_merge(base[key], value)
        else:
            base[key] = value
    return base

Feature Flags: Config That Changes

Some config needs to change without redeploying:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
class FeatureFlags:
    def __init__(self, redis_client):
        self.redis = redis_client
        self.cache = {}
        self.cache_ttl = 60  # Refresh every minute
        self.last_refresh = 0
    
    def is_enabled(self, flag: str, default: bool = False) -> bool:
        self._maybe_refresh()
        return self.cache.get(flag, default)
    
    def _maybe_refresh(self):
        if time.time() - self.last_refresh > self.cache_ttl:
            flags = self.redis.hgetall('feature_flags')
            self.cache = {k: v == 'true' for k, v in flags.items()}
            self.last_refresh = time.time()

# Usage
flags = FeatureFlags(redis)

if flags.is_enabled('new_checkout'):
    return new_checkout_flow(cart)
else:
    return legacy_checkout(cart)

When to use feature flags:

  • Gradual rollouts
  • A/B testing
  • Kill switches for risky features
  • Enabling features for specific users

Config Drift Detection

Production config drifts from what you intended. Detect it:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import hashlib
import json

def config_fingerprint(config: dict) -> str:
    """Generate a hash of the current config."""
    serialized = json.dumps(config, sort_keys=True)
    return hashlib.sha256(serialized.encode()).hexdigest()[:12]

# Log on startup
logger.info(f"Starting with config fingerprint: {config_fingerprint(config)}")

# Store expected fingerprints per environment
EXPECTED_FINGERPRINTS = {
    'production': 'a1b2c3d4e5f6',
    'staging': 'f6e5d4c3b2a1',
}

fingerprint = config_fingerprint(config)
expected = EXPECTED_FINGERPRINTS.get(environment)

if expected and fingerprint != expected:
    logger.warning(
        f"Config drift detected! Expected {expected}, got {fingerprint}"
    )

The Golden Rules

  1. Fail fast. Validate all config on startup. Don’t discover missing config at 3 AM when that code path finally runs.

  2. Document everything. If a config option exists, explain what it does and what values are valid.

  3. Secrets aren’t config. They need encryption at rest, rotation, and audit logs.

  4. Immutable deployments. Config should be baked in at deploy time, not fetched at runtime (except feature flags).

  5. Test your config. Unit test your config loading. Integration test each environment’s config file.

Configuration done well is invisible. Your app starts, reads its config, validates it, and runs. No surprises. That’s the goal.