The scariest moment in software delivery used to be clicking “deploy.” Will it work? Will it break? Will you be debugging at 2 AM?

Blue-green deployments eliminate most of that fear. Instead of updating your production environment in place, you deploy to an identical standby environment and switch traffic over. If something’s wrong, you switch back. Done.

The Core Concept

You maintain two identical production environments:

  • Blue: Currently serving live traffic
  • Green: Idle, ready for the next release

To deploy:

  1. Deploy new version to Green
  2. Test Green thoroughly
  3. Switch traffic from Blue to Green
  4. Green is now live; Blue becomes the standby

If the new version has problems, switch traffic back to Blue. Rollback takes seconds, not minutes.

BAeffUtUosesrereerrsdsdeeppllooyymmeenntt::BGBGlrlrueueeeeennv(v1i1v.d.10l0.)e)1))LSSLittivaavenneddbbyy(rollbackready)

Implementation Patterns

DNS-Based Switching

The simplest approach: point your DNS record at whichever environment is active.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# Blue is live
dig +short app.example.com
# Returns: 10.0.1.100 (blue)

# Deploy to green, test it
# Switch DNS to green
aws route53 change-resource-record-sets \
  --hosted-zone-id Z123 \
  --change-batch '{
    "Changes": [{
      "Action": "UPSERT",
      "ResourceRecordSet": {
        "Name": "app.example.com",
        "Type": "A",
        "TTL": 60,
        "ResourceRecords": [{"Value": "10.0.2.100"}]
      }
    }]
  }'

Pros: Simple, works with any infrastructure Cons: DNS TTL means gradual switchover (clients cache the old IP), not instant

For faster switching, use low TTLs (60 seconds) and weighted routing for gradual traffic shift.

Load Balancer Switching

Better: switch at the load balancer level for instant cutover.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# AWS ALB: swap target groups
aws elbv2 modify-listener \
  --listener-arn arn:aws:elasticloadbalancing:...:listener/app/my-alb/... \
  --default-actions Type=forward,TargetGroupArn=arn:aws:elasticloadbalancing:...:targetgroup/green/...

# Or with weighted targets for gradual rollout
aws elbv2 modify-listener \
  --listener-arn $LISTENER_ARN \
  --default-actions '[
    {
      "Type": "forward",
      "ForwardConfig": {
        "TargetGroups": [
          {"TargetGroupArn": "'$BLUE_TG'", "Weight": 10},
          {"TargetGroupArn": "'$GREEN_TG'", "Weight": 90}
        ]
      }
    }
  ]'

Pros: Instant switchover, supports gradual rollout Cons: Requires load balancer infrastructure

Kubernetes Services

In Kubernetes, switch by updating the Service selector:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Service pointing to blue
apiVersion: v1
kind: Service
metadata:
  name: myapp
spec:
  selector:
    app: myapp
    version: blue  # ← Change to 'green' to switch
  ports:
    - port: 80

Or use an Ingress with traffic splitting:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp
  annotations:
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-weight: "100"  # 100% to green
spec:
  rules:
    - host: app.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: myapp-green
                port:
                  number: 80

Database Considerations

The hard part of blue-green isn’t the application servers — it’s the database.

Option 1: Shared Database

Both blue and green connect to the same database. This works if:

  • Schema changes are backward-compatible
  • You can run old and new code against the same schema
BGlrueeenv1v.10.)1)SharedDatabase

Pattern: Expand-contract migrations

  1. Expand: Add new columns/tables (old code ignores them)
  2. Deploy: Switch to new code
  3. Contract: Remove old columns/tables (after verifying success)

Option 2: Database Per Environment

Each environment has its own database. Requires data synchronization:

BGlrueeenv1v.10.)1)BGlrueeenDBDBReplication

This is complex. You need:

  • Real-time replication from live to standby
  • Promotion of standby to primary during switch
  • Handling of writes that happened during switchover

Usually overkill. Shared database with backward-compatible migrations is simpler.

Option 3: Feature Flags

Decouple deployment from release. Deploy code that supports both old and new behavior, controlled by feature flags:

1
2
3
4
if feature_flags.is_enabled("new_checkout_flow"):
    return new_checkout()
else:
    return old_checkout()

Deploy to both environments, then flip the flag. Database schema? Same strategy — deploy code that handles both schemas, migrate data, flip flag, clean up old code path.

Pre-Deployment Testing

The point of blue-green is catching problems before users see them. Use that staging time:

Smoke Tests

1
2
3
4
5
# Basic health check
curl -f https://green.internal.example.com/health

# Critical path tests
./run-smoke-tests.sh --target https://green.internal.example.com

Synthetic Traffic

Replay production traffic against the green environment:

1
2
3
4
5
# Capture production requests (sanitized)
goreplay --input-raw :80 --output-file requests.gor

# Replay against green
goreplay --input-file requests.gor --output-http https://green.internal:80

Shadow Traffic

Route a copy of live traffic to green (responses discarded) to verify behavior under real load:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
location / {
    proxy_pass http://blue;
    mirror /mirror;
    mirror_request_body on;
}

location /mirror {
    internal;
    proxy_pass http://green$request_uri;
}

Rollback Strategy

The beauty of blue-green: rollback is just another switch.

1
2
3
4
5
6
7
# Oh no, green is broken
# Switch back to blue
aws elbv2 modify-listener \
  --listener-arn $LISTENER_ARN \
  --default-actions Type=forward,TargetGroupArn=$BLUE_TG

# Users are back on blue immediately

Key requirements for safe rollback:

  1. Blue environment is still running (don’t tear it down immediately)
  2. Database changes are backward-compatible (blue can still read/write)
  3. No irreversible side effects (emails sent, external API calls made)

For side effects, consider:

  • Queuing external actions, processing after switchover is confirmed
  • Idempotency keys so repeated actions are safe
  • Feature flags to disable side effects during testing

Cost Optimization

Two production environments = twice the cost, right? Not necessarily:

Scale Down Standby

The standby environment doesn’t need full capacity:

1
2
3
4
5
6
7
8
# Green (standby) - minimal resources
replicas: 2  # Just enough to test

# After switch, scale up green
replicas: 10

# Scale down blue (new standby)
replicas: 2

Spot/Preemptible Instances

Use cheaper instances for the standby environment. If they get terminated, you just re-provision before the next deployment.

Time-Limited Rollback Window

Don’t keep the old environment running forever:

1
2
3
# After 24 hours of stable green deployment
# Tear down blue entirely
# Next deployment: provision fresh blue environment

Modern infrastructure makes provisioning fast enough that you don’t need a permanent standby.

Blue-Green vs. Rolling vs. Canary

StrategyDescriptionRollback SpeedResource Cost
Blue-GreenFull environment switchInstant2x during deploy
RollingGradual instance replacementSlow (reverse the roll)1x + small buffer
CanarySmall % of traffic to new versionFast (just remove canary)1x + canary instances

Use blue-green when:

  • You need instant, reliable rollback
  • You want to test the full environment before any user exposure
  • Your infrastructure supports easy traffic switching

Consider alternatives when:

  • Cost is critical (rolling uses fewer resources)
  • You want gradual exposure with real user feedback (canary)
  • Deployments are very frequent (blue-green has more overhead)

Practical Checklist

Before your first blue-green deployment:

  • Can you provision an identical environment automatically?
  • Is your database migration strategy backward-compatible?
  • Do you have health checks that verify the new environment works?
  • Can you switch traffic in seconds?
  • Can you switch back just as fast?
  • Do you have monitoring to detect problems post-switch?
  • Is there a clear decision process for when to rollback?

If yes to all: you’re ready. If not: fix the gaps first.


Blue-green deployments aren’t magic. They’re just good separation of concerns: deploy and test separately from release. That separation gives you confidence, speed, and the ability to say “oops, let’s undo that” without anyone noticing.