Deploying shouldn’t mean downtime. Blue-green deployment lets you release new versions instantly and roll back just as fast.
The Concept# You maintain two identical production environments:
Blue : Currently serving live trafficGreen : Idle, ready for the next versionTo deploy:
Deploy new version to Green Test Green thoroughly Switch traffic from Blue to Green Green is now live; Blue becomes idle Next deploy: repeat with roles reversed B A e f f U t U o s e s r e r e e r r s s s d w e → i → p t l L c L o o h o y a : a : d d B B a a l l a a n n c c e e r r → → [ [ [ [ B G B G l r l r u e u e e e e e n n v ] v 1 1 v . i . 1 0 d 0 . ] l ] 1 e ] ✓ i d ✓ L l I e L V I E V E
Implementation with Nginx# Simple traffic switching with upstream blocks:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# /etc/nginx/conf.d/app.conf
upstream blue {
server 10.0.1.10 : 8080 ;
server 10.0.1.11 : 8080 ;
}
upstream green {
server 10.0.2.10 : 8080 ;
server 10.0.2.11 : 8080 ;
}
# Point to active environment
upstream active {
server 10.0.1.10 : 8080 ; # Blue is active
server 10.0.1.11 : 8080 ;
}
server {
listen 80 ;
location / {
proxy_pass http://active ;
}
}
Switch by updating the active upstream and reloading:
1
2
3
4
5
6
#!/bin/bash
# switch-to-green.sh
sed -i 's/10.0.1/10.0.2/g' /etc/nginx/conf.d/app.conf
nginx -s reload
echo "Switched to Green"
Implementation with AWS ALB# Use target groups for instant switching:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# Terraform
resource "aws_lb_target_group" "blue" {
name = "app-blue"
port = 8080
protocol = "HTTP"
vpc_id = var . vpc_id
health_check {
path = "/health"
healthy_threshold = 2
unhealthy_threshold = 3
}
}
resource "aws_lb_target_group" "green" {
name = "app-green"
port = 8080
protocol = "HTTP"
vpc_id = var . vpc_id
health_check {
path = "/health"
healthy_threshold = 2
unhealthy_threshold = 3
}
}
resource "aws_lb_listener" "app" {
load_balancer_arn = aws_lb . app . arn
port = 443
protocol = "HTTPS"
default_action {
type = "forward"
target_group_arn = aws_lb_target_group . blue . arn # Current active
}
}
Switch with a single API call:
1
2
3
4
# Switch to green
aws elbv2 modify-listener \
--listener-arn $LISTENER_ARN \
--default-actions Type = forward,TargetGroupArn= $GREEN_TG_ARN
Implementation with Kubernetes# Use service selectors:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# blue-deployment.yaml
apiVersion : apps/v1
kind : Deployment
metadata :
name : app-blue
spec :
replicas : 3
selector :
matchLabels :
app : myapp
version : blue
template :
metadata :
labels :
app : myapp
version : blue
spec :
containers :
- name : app
image : myapp:1.0.0
---
# green-deployment.yaml
apiVersion : apps/v1
kind : Deployment
metadata :
name : app-green
spec :
replicas : 3
selector :
matchLabels :
app : myapp
version : green
template :
metadata :
labels :
app : myapp
version : green
spec :
containers :
- name : app
image : myapp:1.1.0
Service points to active version:
1
2
3
4
5
6
7
8
9
10
11
12
# service.yaml
apiVersion : v1
kind : Service
metadata :
name : myapp
spec :
selector :
app : myapp
version : blue # Switch by changing this
ports :
- port : 80
targetPort : 8080
Switch with:
1
kubectl patch service myapp -p '{"spec":{"selector":{"version":"green"}}}'
Pre-Switch Validation# Never switch without validation:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#!/bin/bash
# deploy.sh
HEALTH_URL = "http://green.internal/health"
SMOKE_URL = "http://green.internal/api/v1/status"
# 1. Deploy to green
deploy_to_green
# 2. Wait for healthy
for i in { 1..30} ; do
if curl -sf " $HEALTH_URL " ; then
echo "Green is healthy"
break
fi
sleep 2
done
# 3. Run smoke tests
if ! curl -sf " $SMOKE_URL " | jq -e '.status == "ok"' ; then
echo "Smoke tests failed, aborting"
exit 1
fi
# 4. Switch traffic
switch_to_green
echo "Deploy complete"
Instant Rollback# The killer feature: rollback is just switching back.
1
2
3
4
5
6
7
8
#!/bin/bash
# rollback.sh
PREVIOUS_ENV = $( cat /var/state/previous-env) # "blue" or "green"
switch_to " $PREVIOUS_ENV "
echo "Rolled back to $PREVIOUS_ENV "
No redeployment. No waiting. Seconds, not minutes.
The Database Problem# Blue-green works perfectly for stateless services. Databases complicate things.
Problem : Both environments share the database. Schema changes can break the old version.
Solutions :
1. Backward-Compatible Migrations# Never make breaking changes:
1
2
3
4
5
6
7
-- Bad: breaks old version immediately
ALTER TABLE users DROP COLUMN legacy_field ;
-- Good: add new, keep old
ALTER TABLE users ADD COLUMN new_field VARCHAR ( 255 );
-- Later, after all code migrated:
ALTER TABLE users DROP COLUMN legacy_field ;
2. Expand-Contract Pattern# Three-phase migration:
P P P h h h a a a s s s e e e 1 2 3 ( ( ( E M C x i o p g n a r t n a r d t a ) e c : ) t : ) A : d M d o R v e n e m e o w d v a e c t o a o l , l u d m d n e c , p o l l b o u o y m t n h n e a v w f e t r c e s o r i d o e o n l s d w v o e r r k s i o n r e t i r e d
1
2
3
4
5
6
7
8
9
10
11
-- Phase 1: Expand
ALTER TABLE users ADD COLUMN email_verified BOOLEAN DEFAULT false ;
-- Phase 2: Migrate data
UPDATE users SET email_verified = ( verification_date IS NOT NULL );
-- Deploy new version that uses email_verified
-- Switch traffic to green
-- Phase 3: Contract (after blue retired)
ALTER TABLE users DROP COLUMN verification_date ;
3. Database Per Environment# For major changes, use separate databases:
B G l r u e e e n → → D D a a t t a a b b a a s s e e A B v v 1 2 s s c c h h e e m m a a ) )
Requires data sync strategy—complex but sometimes necessary.
Cost Considerations# Blue-green doubles your infrastructure (temporarily):
Always running both:
2x compute cost Simple, instant switching Good for critical, high-traffic apps Spin up green on demand:
Deploy: provision green → deploy → switch → destroy blue Lower cost, slower deploys Good for less critical apps 1
2
3
4
5
# On-demand green environment
terraform apply -var= "environment=green"
deploy_app green
switch_traffic green
terraform destroy -var= "environment=blue"
Blue-Green vs Canary vs Rolling# Strategy Rollback Speed Risk Complexity Blue-Green Instant All-or-nothing Medium Canary Fast Gradual exposure High Rolling Slow Mixed versions Low
Blue-Green : Best for confident releases with easy rollback
Canary : Best for testing with real traffic before full rollout
Rolling : Best for simple apps where mixed versions are OK
Automation Example# Complete deployment script:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#!/bin/bash
set -e
CURRENT = $( get_active_environment) # "blue" or "green"
TARGET = $( [ " $CURRENT " = "blue" ] && echo "green" || echo "blue" )
echo "Deploying to $TARGET (current: $CURRENT )"
# Deploy
docker pull myapp:$VERSION
ansible-playbook deploy.yml -e "env= $TARGET version= $VERSION "
# Validate
./scripts/health-check.sh $TARGET
./scripts/smoke-test.sh $TARGET
# Switch
./scripts/switch-traffic.sh $TARGET
# Record state
echo $CURRENT > /var/state/previous-env
echo $TARGET > /var/state/current-env
# Notify
curl -X POST " $SLACK_WEBHOOK " \
-d "{\"text\":\"Deployed $VERSION to $TARGET \"}"
echo "Deploy complete"
When Not to Use Blue-Green# Stateful applications : Session affinity issuesLong-running connections : WebSockets need graceful drainTight budgets : 2x infrastructure may not be feasibleMassive data migrations : Database sync becomes impracticalFor these cases, consider canary deployments or feature flags instead.
Blue-green deployment trades infrastructure cost for deployment confidence. When a bad release means lost revenue, that trade-off makes sense.