What if you could deploy with a safety net? Blue-green deployments give you exactly that: two identical production environments, one serving traffic while the other waits in the wings.

Deploy to the idle environment, test it, then switch traffic instantly. If something breaks, switch back. No rollback procedure—just flip.

The Core Concept

You maintain two identical environments:

  • Blue: Currently serving production traffic
  • Green: Idle, ready for the next release

Deployment flow:

  1. Deploy new version to Green
  2. Run smoke tests against Green
  3. Switch load balancer to point to Green
  4. Green becomes the new production
  5. Blue becomes the rollback target
UsersABvCl1Tu.Ie0VL)EoadBalancerGvIr1De.Le1En)

Implementation Patterns

DNS-Based Switching

The simplest approach—update DNS to point to the new environment:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Deploy to green
./deploy.sh green v1.1

# Test green
curl https://green.example.com/health

# Switch DNS
aws route53 change-resource-record-sets \
  --hosted-zone-id Z123 \
  --change-batch '{
    "Changes": [{
      "Action": "UPSERT",
      "ResourceRecordSet": {
        "Name": "api.example.com",
        "Type": "A",
        "AliasTarget": {
          "DNSName": "green-lb.example.com",
          "HostedZoneId": "Z456"
        }
      }
    }]
  }'

Pros: Simple, works with any infrastructure. Cons: DNS TTL means slow propagation. Some clients cache longer than specified.

Load Balancer Switching

Update the load balancer’s target group:

1
2
3
4
# AWS ALB example
aws elbv2 modify-listener \
  --listener-arn arn:aws:elasticloadbalancing:...:listener/app/my-lb/... \
  --default-actions Type=forward,TargetGroupArn=arn:aws:...:targetgroup/green/...

Pros: Instant switch, no DNS propagation. Cons: More infrastructure to manage.

Kubernetes Service Switching

Use label selectors to switch traffic:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
# Blue deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-blue
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
      version: blue
  template:
    metadata:
      labels:
        app: myapp
        version: blue
---
# Green deployment  
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-green
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
      version: green
  template:
    metadata:
      labels:
        app: myapp
        version: green
---
# Service (switch by changing selector)
apiVersion: v1
kind: Service
metadata:
  name: myapp
spec:
  selector:
    app: myapp
    version: blue  # Change to 'green' to switch
  ports:
    - port: 80

Switch with one command:

1
kubectl patch service myapp -p '{"spec":{"selector":{"version":"green"}}}'

Automation Script

A complete blue-green deployment script:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#!/bin/bash
set -e

CURRENT=$(kubectl get svc myapp -o jsonpath='{.spec.selector.version}')
TARGET=$([ "$CURRENT" = "blue" ] && echo "green" || echo "blue")

echo "Current: $CURRENT, Deploying to: $TARGET"

# Deploy new version to idle environment
kubectl set image deployment/app-$TARGET app=myapp:$VERSION

# Wait for rollout
kubectl rollout status deployment/app-$TARGET --timeout=300s

# Run smoke tests
echo "Running smoke tests against $TARGET..."
./smoke-tests.sh "http://app-$TARGET.internal"

if [ $? -eq 0 ]; then
  echo "Smoke tests passed. Switching traffic..."
  kubectl patch service myapp -p "{\"spec\":{\"selector\":{\"version\":\"$TARGET\"}}}"
  echo "Traffic switched to $TARGET"
else
  echo "Smoke tests failed. Aborting."
  exit 1
fi

Pre-Switch Validation

Never switch without testing:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
async function validateGreenEnvironment() {
  const checks = [
    // Health check
    fetch('https://green.internal/health')
      .then(r => r.ok ? 'pass' : 'fail'),
    
    // API smoke test
    fetch('https://green.internal/api/status')
      .then(r => r.json())
      .then(d => d.version === expectedVersion ? 'pass' : 'fail'),
    
    // Database connectivity
    fetch('https://green.internal/api/db-check')
      .then(r => r.ok ? 'pass' : 'fail'),
    
    // Critical endpoint
    fetch('https://green.internal/api/users/test-user')
      .then(r => r.status === 200 ? 'pass' : 'fail'),
  ];
  
  const results = await Promise.all(checks);
  const allPassed = results.every(r => r === 'pass');
  
  console.log('Validation results:', results);
  return allPassed;
}

Database Considerations

Blue-green works best with stateless applications. Databases complicate things:

Option 1: Shared Database Both environments connect to the same database. Schema changes must be backward compatible.

BGlrueeenv1v.10.)1)Database

Option 2: Database per Environment More isolation, but requires data sync:

BGlrueeenv1v.10.)1)DDBB--BGl(rureeeepnlication)

Option 3: Expand-Contract Migrations Use the expand-contract pattern from database migrations—new code works with old schema, old code works with new schema.

Rollback Procedure

The beauty of blue-green: rollback is just another switch.

1
2
# Something went wrong with green? Switch back to blue
kubectl patch service myapp -p '{"spec":{"selector":{"version":"blue"}}}'

No redeployment. No waiting. Just flip.

Set up automated rollback on error detection:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
async function monitorAndRollback(activeEnv, previousEnv) {
  const errorThreshold = 0.05; // 5% error rate
  
  for (let i = 0; i < 10; i++) {
    await sleep(60000); // Check every minute
    
    const errorRate = await getErrorRate(activeEnv);
    console.log(`Error rate: ${errorRate}`);
    
    if (errorRate > errorThreshold) {
      console.log('Error threshold exceeded, rolling back!');
      await switchTraffic(previousEnv);
      await alert('Automatic rollback triggered');
      return;
    }
  }
  
  console.log('Deployment stable after 10 minutes');
}

Cost Considerations

Blue-green doubles your infrastructure (temporarily):

Mitigation strategies:

  • Scale down idle environment to minimum
  • Use spot instances for the idle environment
  • Share stateless components (load balancers, databases)
  • Terminate idle environment after successful deployment window
1
2
3
4
5
# After successful deployment, scale down old environment
kubectl scale deployment app-blue --replicas=1

# After 24 hours with no issues, you could remove it entirely
# (but keep the deployment manifest for quick spin-up)

Blue-Green vs Canary

AspectBlue-GreenCanary
Traffic splitAll or nothingGradual (1% → 10% → 100%)
Rollback speedInstantFast (stop rollout)
Risk exposureFull exposure after switchLimited exposure during rollout
Infrastructure2x during deploy1x + small canary
ComplexitySimplerMore complex

Use blue-green when you need instant rollback and can afford the infrastructure cost. Use canary when you want to limit blast radius.

Common Mistakes

No smoke tests before switching: You’re gambling that the deployment worked.

Switching during peak traffic: Test under load, but switch during low-traffic windows if possible.

Forgetting sticky sessions: If users have session affinity to old servers, switching breaks their sessions.

Database schema mismatch: New code expecting new columns while old code still runs = errors.

Not monitoring after switch: The first 10 minutes after switch are critical.

The Mental Model

Think of blue-green like having two stages at a concert:

  • One stage is live, audience watching
  • Other stage is set up with the next act
  • When ready, lights dim on one stage, come up on the other
  • If the new act bombs, you can switch back instantly

The audience (users) experiences a seamless transition. The production crew (you) has full control and a safety net.

That’s the promise of blue-green: deploy with confidence, because rollback is one command away.