Load balancers are invisible until they break. Then they’re the only thing anyone talks about. Here’s how to get them right.
Algorithms That Matter# Round Robin# Requests go to each server in sequence: A, B, C, A, B, C…
1
2
3
4
5
upstream backend {
server 10.0.0.1 : 8080 ;
server 10.0.0.2 : 8080 ;
server 10.0.0.3 : 8080 ;
}
Good for: Homogeneous servers, stateless apps
Bad for: Servers with different capacities, long-running requests
Weighted Round Robin# Some servers get more traffic:
1
2
3
4
5
upstream backend {
server 10.0.0.1 : 8080 weight=5 ; # 50% of traffic
server 10.0.0.2 : 8080 weight=3 ; # 30% of traffic
server 10.0.0.3 : 8080 weight=2 ; # 20% of traffic
}
Good for: Mixed server capacities, gradual migrations
Bad for: Dynamic capacity changes
Least Connections# Send to the server with fewest active connections:
1
2
3
4
5
6
upstream backend {
least_conn ;
server 10.0.0.1 : 8080 ;
server 10.0.0.2 : 8080 ;
server 10.0.0.3 : 8080 ;
}
Good for: Varying request durations, WebSockets
Bad for: Very short requests (overhead of counting)
IP Hash# Same client IP always goes to same server:
1
2
3
4
5
6
upstream backend {
ip_hash ;
server 10.0.0.1 : 8080 ;
server 10.0.0.2 : 8080 ;
server 10.0.0.3 : 8080 ;
}
Good for: Session affinity without sticky cookies
Bad for: Clients behind NAT (uneven distribution)
Consistent Hashing# Better distribution that survives server changes:
1
2
3
4
5
6
upstream backend {
hash $request_uri consistent ;
server 10.0.0.1 : 8080 ;
server 10.0.0.2 : 8080 ;
server 10.0.0.3 : 8080 ;
}
When a server is added/removed, only 1/n of requests move to a different server.
Health Checks# Don’t send traffic to dead servers.
Passive Health Checks (Nginx)# 1
2
3
4
upstream backend {
server 10.0.0.1 : 8080 max_fails=3 fail_timeout=30s ;
server 10.0.0.2 : 8080 max_fails=3 fail_timeout=30s ;
}
After 3 failures, server is marked down for 30 seconds.
Active Health Checks (Nginx Plus / HAProxy)# 1
2
3
4
5
6
7
8
9
10
11
12
13
# Nginx Plus
upstream backend {
zone backend 64k ;
server 10.0.0.1 : 8080 ;
server 10.0.0.2 : 8080 ;
health_check interval=5s passes=2 fails=3 ;
}
location /health {
internal ;
proxy_pass http://backend/health ;
}
1
2
3
4
5
6
7
# HAProxy
backend servers
option httpchk GET /health
http-check expect status 200
server srv1 10.0.0.1:8080 check inter 5s fall 3 rise 2
server srv2 10.0.0.2:8080 check inter 5s fall 3 rise 2
Active checks probe servers continuously, not just when requests fail.
Custom Health Endpoints# 1
2
3
4
5
6
7
8
9
10
11
12
13
# Flask health endpoint
@app.route ( '/health' )
def health ():
checks = {
'database' : check_database (),
'redis' : check_redis (),
'disk' : check_disk_space (),
}
all_healthy = all ( checks . values ())
status_code = 200 if all_healthy else 503
return jsonify ( checks ), status_code
Return 503 when unhealthy. Load balancer removes the server.
SSL/TLS Termination# At the Load Balancer# 1
2
3
4
5
6
7
8
9
server {
listen 443 ssl ;
ssl_certificate /etc/ssl/cert.pem ;
ssl_certificate_key /etc/ssl/key.pem ;
location / {
proxy_pass http://backend ; # HTTP to backends
}
}
Pros: Simpler cert management, offloads crypto from app servers
Cons: Traffic unencrypted internally
End-to-End Encryption# 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
upstream backend {
server 10.0.0.1 : 443 ;
server 10.0.0.2 : 443 ;
}
server {
listen 443 ssl ;
ssl_certificate /etc/ssl/cert.pem ;
ssl_certificate_key /etc/ssl/key.pem ;
location / {
proxy_pass https://backend ;
proxy_ssl_verify on ;
proxy_ssl_trusted_certificate /etc/ssl/ca.pem ;
}
}
Pros: Encrypted end-to-end
Cons: More complex, more CPU usage
Connection Handling# Keep-Alive to Backends# 1
2
3
4
5
6
7
8
9
10
11
12
13
upstream backend {
server 10.0.0.1 : 8080 ;
server 10.0.0.2 : 8080 ;
keepalive 32 ; # Connection pool size
}
server {
location / {
proxy_pass http://backend ;
proxy_http_version 1 .1 ;
proxy_set_header Connection "" ;
}
}
Reuse connections to backends. Reduces latency and resource usage.
Connection Limits# 1
2
3
4
5
upstream backend {
server 10.0.0.1 : 8080 max_conns=100 ;
server 10.0.0.2 : 8080 max_conns=100 ;
queue 100 timeout=30s ;
}
Prevents overwhelming backends. Queues excess requests.
Timeouts# 1
2
3
proxy_connect_timeout 5s ; # Time to establish connection
proxy_send_timeout 60s ; # Time to send request
proxy_read_timeout 60s ; # Time to receive response
Set appropriate timeouts. Too short = errors. Too long = resource exhaustion.
Session Persistence# Cookie-Based Sticky Sessions# 1
2
3
4
5
6
upstream backend {
server 10.0.0.1 : 8080 ;
server 10.0.0.2 : 8080 ;
sticky cookie srv_id expires=1h domain=.example.com path=/ ;
}
1
2
3
4
backend servers
cookie SERVERID insert indirect nocache
server srv1 10.0.0.1:8080 cookie s1
server srv2 10.0.0.2:8080 cookie s2
Client gets a cookie, returns to same server.
When to Avoid Sticky Sessions# Sticky sessions complicate:
Scaling down (stuck users) Server failures (session loss) Deployment (draining) Better approach: externalized sessions (Redis, database).
Graceful Shutdown# Connection Draining# 1
2
3
4
5
# Mark server as draining
upstream backend {
server 10.0.0.1 : 8080 ;
server 10.0.0.2 : 8080 drain ; # No new connections
}
1
2
# HAProxy runtime API
echo "set server backend/srv2 state drain" | socat stdio /var/run/haproxy.sock
Existing connections complete. No new connections accepted.
Application-Level Graceful Shutdown# 1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Python signal handler
import signal
import sys
def graceful_shutdown ( signum , frame ):
# Stop accepting new requests
server . stop_accepting ()
# Wait for existing requests (max 30s)
server . shutdown ( timeout = 30 )
sys . exit ( 0 )
signal . signal ( signal . SIGTERM , graceful_shutdown )
Your app should handle SIGTERM gracefully.
High Availability# Active-Passive with Keepalived# B ┌ │ │ └ a ─ ─ c ─ ─ ┌ ▼ k ─ ─ ─ e ─ L 1 ─ ─ n ─ B 0 ─ ─ d ─ 1 . ─ ─ ─ 0 ─ ─ ┌ │ ┴ ( . ┬ │ ┴ ─ ─ A 0 ─ ─ ─ ─ c . ─ ─ ─ ─ t 1 ─ ─ B ─ ─ i ─ ─ a ─ ─ v ─ ─ c ─ ─ e ─ ┐ ▼ k ─ ─ ) ─ e V ─ ┐ │ ┘ n I ─ │ ◄ d P ─ ─ : ─ ─ V ─ ─ R 1 │ ┴ ─ R 0 ─ ─ P . ─ ─ 0 ─ ─ . ─ ─ 0 ─ ┌ ► . ─ ─ │ │ └ 1 ─ ─ ─ 0 ─ ─ ─ 0 ─ ─ L ─ ─ ─ B 1 ─ ─ ─ 2 0 ─ ─ ─ . ─ ┐ │ ┴ ( 0 ─ ─ P . ─ ─ a 0 ─ ─ s . ─ ─ s 2 ─ ─ i ─ ─ v ─ ─ e ─ ┐ ) ─ │ ┘ │
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# /etc/keepalived/keepalived.conf on LB1
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 100
virtual_ipaddress {
10.0.0.100
}
track_script {
check_nginx
}
}
vrrp_script check_nginx {
script "/usr/bin/pgrep nginx"
interval 2
weight -20
}
VIP moves to standby if primary fails.
Active-Active with DNS# ┌ │ └ ─ ─ ─ ─ ─ ─ ┌ │ │ │ └ ─ L ─ ─ ─ ┌ │ ┴ B ┬ │ └ ─ ─ ─ ─ 1 ─ ─ ─ ─ ─ ─ ─ ─ ─ l l ─ ─ ─ ─ ─ ─ D b b ─ ─ ─ ─ ─ ─ N 1 2 ─ ─ ┐ │ ┘ ─ ┌ │ └ ─ S . . ─ ─ ─ ─ ─ ─ e e ─ ─ ─ ─ B ─ ─ R x x ─ ─ ─ ─ a ─ ─ o a a ─ ─ ─ ─ c ─ ─ u m m ┬ │ ┴ ┬ │ ┴ k ─ ─ n p p ─ ─ ─ ─ e ─ ─ d l l ─ ─ ─ ─ n ─ ─ e e ─ ─ ─ ─ d ─ ─ R . . ─ ─ ─ ─ s ─ ─ o c c ─ ─ ┌ │ └ ─ ┐ ┘ ─ b o o ─ ─ ─ ─ ─ │ ─ i m m ─ ─ ─ ─ ─ ─ n ─ ─ ─ ─ ─ ─ ─ ─ ─ L ─ ─ ─ ─ ┐ │ ┴ B ┬ │ ┘ ┐ │ │ ┘ ─ 2 ─ │ ─ ─ ─ ─ ─ ─ ┐ │ ┘
Both load balancers active. DNS distributes clients.
Cloud Load Balancers# AWS ALB# 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
resource "aws_lb" "main" {
name = "app-lb"
internal = false
load_balancer_type = "application"
subnets = var . public_subnets
security_groups = [ aws_security_group . lb . id ]
}
resource "aws_lb_target_group" "main" {
name = "app-tg"
port = 8080
protocol = "HTTP"
vpc_id = var . vpc_id
health_check {
path = "/health"
healthy_threshold = 2
unhealthy_threshold = 3
interval = 30
}
}
resource "aws_lb_listener" "https" {
load_balancer_arn = aws_lb . main . arn
port = 443
protocol = "HTTPS"
ssl_policy = "ELBSecurityPolicy-TLS13-1-2-2021-06"
certificate_arn = var . cert_arn
default_action {
type = "forward"
target_group_arn = aws_lb_target_group . main . arn
}
}
Kubernetes Ingress# 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
apiVersion : networking.k8s.io/v1
kind : Ingress
metadata :
name : app-ingress
annotations :
nginx.ingress.kubernetes.io/proxy-body-size : "50m"
nginx.ingress.kubernetes.io/proxy-read-timeout : "60"
spec :
ingressClassName : nginx
tls :
- hosts :
- app.example.com
secretName : app-tls
rules :
- host : app.example.com
http :
paths :
- path : /
pathType : Prefix
backend :
service :
name : app-service
port :
number : 80
Monitoring# Key Metrics# 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# Prometheus queries
# Request rate
sum(rate(nginx_http_requests_total[5m])) by (status)
# Error rate
sum(rate(nginx_http_requests_total{status=~"5.."}[5m]))
/ sum(rate(nginx_http_requests_total[5m]))
# Latency percentiles
histogram_quantile(0.99,
sum(rate(nginx_http_request_duration_seconds_bucket[5m])) by (le))
# Active connections
nginx_connections_active
# Backend health
sum(nginx_upstream_peers{state="up"}) by (upstream)
Alerts# 1
2
3
4
5
6
7
8
9
10
11
12
13
- alert : HighErrorRate
expr : |
sum(rate(nginx_http_requests_total{status=~"5.."}[5m]))
/ sum(rate(nginx_http_requests_total[5m])) > 0.05
for : 2m
annotations :
summary : "Error rate above 5%"
- alert : BackendDown
expr : nginx_upstream_peers{state="down"} > 0
for : 1m
annotations :
summary : "Backend server is down"
The Checklist# Start Here# Today: Verify health checks are workingThis week: Enable connection keep-aliveThis month: Add HA (eliminate single point of failure)This quarter: Implement comprehensive monitoringLoad balancers should be boring. When they’re working right, nobody notices them.
The best load balancer is one that distributes traffic so evenly, you forget you have multiple servers.