Blue-Green Deployments: Zero-Downtime Releases

Deploying shouldn’t mean downtime. Blue-green deployment lets you release new versions instantly and roll back just as fast.

The Concept

You maintain two identical production environments:

Blue: Currently serving live traffic
Green: Idle, ready for the next version

To deploy:

Deploy new version to Green
Test Green thoroughly
Switch traffic from Blue to Green
Green is now live; Blue becomes idle
Next deploy: repeat with roles reversed

Implementation with Nginx

Simple traffic switching with upstream blocks:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# /etc/nginx/conf.d/app.conf

upstream blue {
    server 10.0.1.10:8080;
    server 10.0.1.11:8080;
}

upstream green {
    server 10.0.2.10:8080;
    server 10.0.2.11:8080;
}

# Point to active environment
upstream active {
    server 10.0.1.10:8080;  # Blue is active
    server 10.0.1.11:8080;
}

server {
    listen 80;
    location / {
        proxy_pass http://active;
    }
}

Switch by updating the active upstream and reloading:

1
2
3
4
5
6
#!/bin/bash
# switch-to-green.sh

sed -i 's/10.0.1/10.0.2/g' /etc/nginx/conf.d/app.conf
nginx -s reload
echo "Switched to Green"

Implementation with AWS ALB

Use target groups for instant switching:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# Terraform

resource "aws_lb_target_group" "blue" {
  name     = "app-blue"
  port     = 8080
  protocol = "HTTP"
  vpc_id   = var.vpc_id
  
  health_check {
    path                = "/health"
    healthy_threshold   = 2
    unhealthy_threshold = 3
  }
}

resource "aws_lb_target_group" "green" {
  name     = "app-green"
  port     = 8080
  protocol = "HTTP"
  vpc_id   = var.vpc_id
  
  health_check {
    path                = "/health"
    healthy_threshold   = 2
    unhealthy_threshold = 3
  }
}

resource "aws_lb_listener" "app" {
  load_balancer_arn = aws_lb.app.arn
  port              = 443
  protocol          = "HTTPS"
  
  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.blue.arn  # Current active
  }
}

Switch with a single API call:

1
2
3
4
# Switch to green
aws elbv2 modify-listener \
  --listener-arn $LISTENER_ARN \
  --default-actions Type=forward,TargetGroupArn=$GREEN_TG_ARN

Implementation with Kubernetes

Use service selectors:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# blue-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-blue
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
      version: blue
  template:
    metadata:
      labels:
        app: myapp
        version: blue
    spec:
      containers:
      - name: app
        image: myapp:1.0.0

---
# green-deployment.yaml  
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-green
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
      version: green
  template:
    metadata:
      labels:
        app: myapp
        version: green
    spec:
      containers:
      - name: app
        image: myapp:1.1.0

Service points to active version:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# service.yaml
apiVersion: v1
kind: Service
metadata:
  name: myapp
spec:
  selector:
    app: myapp
    version: blue  # Switch by changing this
  ports:
  - port: 80
    targetPort: 8080

Switch with:

1
kubectl patch service myapp -p '{"spec":{"selector":{"version":"green"}}}'

Pre-Switch Validation

Never switch without validation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#!/bin/bash
# deploy.sh

HEALTH_URL="http://green.internal/health"
SMOKE_URL="http://green.internal/api/v1/status"

# 1. Deploy to green
deploy_to_green

# 2. Wait for healthy
for i in {1..30}; do
  if curl -sf "$HEALTH_URL"; then
    echo "Green is healthy"
    break
  fi
  sleep 2
done

# 3. Run smoke tests
if ! curl -sf "$SMOKE_URL" | jq -e '.status == "ok"'; then
  echo "Smoke tests failed, aborting"
  exit 1
fi

# 4. Switch traffic
switch_to_green

echo "Deploy complete"

Instant Rollback

The killer feature: rollback is just switching back.

1
2
3
4
5
6
7
8
#!/bin/bash
# rollback.sh

PREVIOUS_ENV=$(cat /var/state/previous-env)  # "blue" or "green"

switch_to "$PREVIOUS_ENV"

echo "Rolled back to $PREVIOUS_ENV"

No redeployment. No waiting. Seconds, not minutes.

The Database Problem

Blue-green works perfectly for stateless services. Databases complicate things.

Problem: Both environments share the database. Schema changes can break the old version.

Solutions:

1. Backward-Compatible Migrations

Never make breaking changes:

1
2
3
4
5
6
7
-- Bad: breaks old version immediately
ALTER TABLE users DROP COLUMN legacy_field;

-- Good: add new, keep old
ALTER TABLE users ADD COLUMN new_field VARCHAR(255);
-- Later, after all code migrated:
ALTER TABLE users DROP COLUMN legacy_field;

2. Expand-Contract Pattern

Three-phase migration:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
-- Phase 1: Expand
ALTER TABLE users ADD COLUMN email_verified BOOLEAN DEFAULT false;

-- Phase 2: Migrate data
UPDATE users SET email_verified = (verification_date IS NOT NULL);

-- Deploy new version that uses email_verified
-- Switch traffic to green

-- Phase 3: Contract (after blue retired)
ALTER TABLE users DROP COLUMN verification_date;

3. Database Per Environment

For major changes, use separate databases:

Requires data sync strategy—complex but sometimes necessary.

Cost Considerations

Blue-green doubles your infrastructure (temporarily):

Always running both:

2x compute cost
Simple, instant switching
Good for critical, high-traffic apps

Spin up green on demand:

Deploy: provision green → deploy → switch → destroy blue
Lower cost, slower deploys
Good for less critical apps

1
2
3
4
5
# On-demand green environment
terraform apply -var="environment=green"
deploy_app green
switch_traffic green
terraform destroy -var="environment=blue"

Blue-Green vs Canary vs Rolling

Strategy	Rollback Speed	Risk	Complexity
Blue-Green	Instant	All-or-nothing	Medium
Canary	Fast	Gradual exposure	High
Rolling	Slow	Mixed versions	Low

Blue-Green: Best for confident releases with easy rollback Canary: Best for testing with real traffic before full rollout Rolling: Best for simple apps where mixed versions are OK

Automation Example

Complete deployment script:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#!/bin/bash
set -e

CURRENT=$(get_active_environment)  # "blue" or "green"
TARGET=$([ "$CURRENT" = "blue" ] && echo "green" || echo "blue")

echo "Deploying to $TARGET (current: $CURRENT)"

# Deploy
docker pull myapp:$VERSION
ansible-playbook deploy.yml -e "env=$TARGET version=$VERSION"

# Validate
./scripts/health-check.sh $TARGET
./scripts/smoke-test.sh $TARGET

# Switch
./scripts/switch-traffic.sh $TARGET

# Record state
echo $CURRENT > /var/state/previous-env
echo $TARGET > /var/state/current-env

# Notify
curl -X POST "$SLACK_WEBHOOK" \
  -d "{\"text\":\"Deployed $VERSION to $TARGET\"}"

echo "Deploy complete"

When Not to Use Blue-Green

Stateful applications: Session affinity issues
Long-running connections: WebSockets need graceful drain
Tight budgets: 2x infrastructure may not be feasible
Massive data migrations: Database sync becomes impractical

For these cases, consider canary deployments or feature flags instead.

Blue-green deployment trades infrastructure cost for deployment confidence. When a bad release means lost revenue, that trade-off makes sense.

The Concept#

Implementation with Nginx#

Implementation with AWS ALB#

Implementation with Kubernetes#

Pre-Switch Validation#

Instant Rollback#

The Database Problem#

1. Backward-Compatible Migrations#

2. Expand-Contract Pattern#

3. Database Per Environment#

Cost Considerations#

Blue-Green vs Canary vs Rolling#

Automation Example#

When Not to Use Blue-Green#

📬 Get the Newsletter