Load Balancing: Beyond Round Robin

Round robin is the default. It’s also often wrong. Here’s how to choose load balancing strategies that actually match your workload.

The Strategies

Round Robin

Each request goes to the next server in rotation.

1
2
3
4
5
upstream backend {
    server 10.0.0.1;
    server 10.0.0.2;
    server 10.0.0.3;
}

Good for: Stateless services, similar server capacity
Bad for: Long-running connections, mixed server specs, sticky sessions

Weighted Round Robin

Same rotation, but some servers get more traffic.

1
2
3
4
5
upstream backend {
    server 10.0.0.1 weight=5;  # 5x traffic
    server 10.0.0.2 weight=3;  # 3x traffic
    server 10.0.0.3 weight=1;  # 1x traffic
}

Good for: Mixed hardware, gradual rollouts
Bad for: Dynamic load patterns

Least Connections

Send to the server with fewest active connections.

1
2
3
4
5
upstream backend {
    least_conn;
    server 10.0.0.1;
    server 10.0.0.2;
}

Good for: Variable request duration, long-running requests
Bad for: Quick requests (overhead of tracking connections)

IP Hash (Source Hashing)

Same client always hits same server.

1
2
3
4
5
upstream backend {
    ip_hash;
    server 10.0.0.1;
    server 10.0.0.2;
}

Good for: Session affinity without cookies
Bad for: Uneven client distribution, NAT (many clients share IP)

Consistent Hashing

Hash-based routing that minimizes disruption when servers change.

1
2
3
4
5
upstream backend {
    hash $request_uri consistent;
    server 10.0.0.1;
    server 10.0.0.2;
}

Good for: Caching layers, stateful services
Bad for: Highly dynamic server pools

Least Time (Response-Based)

Route to fastest responding server.

1
2
3
4
5
upstream backend {
    least_time header;  # or 'last_byte'
    server 10.0.0.1;
    server 10.0.0.2;
}

Good for: Heterogeneous backends, geographic distribution
Bad for: Requires NGINX Plus or HAProxy

Random with Two Choices

Pick two servers randomly, send to less loaded one.

1
2
3
4
backend servers
    balance random(2)
    server s1 10.0.0.1:80
    server s2 10.0.0.2:80

Good for: Large server pools, avoiding hotspots
Bad for: Small pools (overhead isn’t worth it)

Health Checks

A load balancer is only as good as its health checks.

Passive Health Checks

Track failures from real traffic:

1
2
3
4
upstream backend {
    server 10.0.0.1 max_fails=3 fail_timeout=30s;
    server 10.0.0.2 max_fails=3 fail_timeout=30s;
}

3 failures → server marked down for 30 seconds.

Problem: Slow detection. Users hit errors before server is removed.

Active Health Checks

Proactively test servers:

1
2
3
4
5
6
7
# NGINX Plus / OpenResty
upstream backend {
    server 10.0.0.1;
    server 10.0.0.2;
    
    health_check interval=5s fails=2 passes=3 uri=/health;
}

Better: Catches issues before users do.

Deep Health Checks

Check more than “is it responding”:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
@app.get("/health")
def health():
    checks = {
        "database": test_db_connection(),
        "cache": test_redis(),
        "disk": check_disk_space() > 10,  # >10% free
    }
    
    if all(checks.values()):
        return {"status": "healthy"}
    else:
        return Response(status_code=503)

Best: Removes servers that are up but not useful.

Session Persistence

Sometimes you need requests to stick to a server.

Load balancer sets a cookie:

1
2
3
4
5
6
upstream backend {
    server 10.0.0.1;
    server 10.0.0.2;
    
    sticky cookie srv_id expires=1h;
}

Pros: Works through NAT, survives IP changes
Cons: Requires cookie support

Source IP

Hash client IP:

1
2
3
4
5
upstream backend {
    ip_hash;
    server 10.0.0.1;
    server 10.0.0.2;
}

Pros: No cookies needed
Cons: NAT breaks it, mobile clients change IPs

Application-Level

Let the app handle it:

1
2
# Store session in Redis, not server memory
session = redis.get(f"session:{session_id}")

Best: Stateless servers, sessions in shared store. No sticky sessions needed.

Draining and Graceful Shutdown

Don’t kill connections mid-request.

Connection Draining

Stop new connections, let existing ones finish:

1
2
3
4
upstream backend {
    server 10.0.0.1;
    server 10.0.0.2 down;  # Marked down, won't get new traffic
}

Or in HAProxy:

1
echo "set server backend/s2 state drain" | socat stdio /var/run/haproxy.sock

Graceful Shutdown

Application signals readiness to stop:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
@app.get("/health")
def health():
    if shutting_down:
        return Response(status_code=503)
    return {"status": "healthy"}

def shutdown_handler(signum, frame):
    global shutting_down
    shutting_down = True
    time.sleep(30)  # Let in-flight requests complete
    sys.exit(0)

signal.signal(signal.SIGTERM, shutdown_handler)

Layer 4 vs Layer 7

Layer 4 (TCP/UDP)

Routes based on IP and port. Doesn’t inspect content.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
stream {
    upstream backend {
        server 10.0.0.1:5432;
        server 10.0.0.2:5432;
    }
    
    server {
        listen 5432;
        proxy_pass backend;
    }
}

Good for: Databases, non-HTTP protocols, raw performance
Bad for: HTTP features (path routing, headers, cookies)

Layer 7 (HTTP)

Inspects HTTP content. Routes based on path, headers, etc.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
upstream api {
    server 10.0.0.1:8080;
}

upstream web {
    server 10.0.0.2:80;
}

server {
    location /api/ {
        proxy_pass http://api;
    }
    
    location / {
        proxy_pass http://web;
    }
}

Good for: HTTP services, content-based routing
Bad for: Raw TCP, protocols that aren’t HTTP

Common Mistakes

1. Ignoring Connection Limits

1
2
3
4
5
6
7
8
9
# Bad: No limits
upstream backend {
    server 10.0.0.1;
}

# Good: Protect backend
upstream backend {
    server 10.0.0.1 max_conns=100;
}

2. No Timeouts

1
2
3
4
5
6
7
# Bad: Wait forever
proxy_pass http://backend;

# Good: Fail fast
proxy_connect_timeout 5s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;

3. Missing X-Forwarded Headers

1
2
3
4
5
6
7
8
# Backend sees load balancer IP, not client IP
proxy_pass http://backend;

# Fixed: Pass real client info
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header Host $host;

4. One Massive Pool

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# Bad: Everything in one pool
upstream everything {
    server api-1;
    server api-2;
    server web-1;
    server worker-1;
}

# Good: Separate pools per service
upstream api {
    server api-1;
    server api-2;
}

upstream web {
    server web-1;
}

Choosing a Strategy

Workload	Strategy
Stateless, similar servers	Round robin
Mixed server specs	Weighted round robin
Variable request duration	Least connections
Need session affinity	IP hash or sticky cookies
Caching layer	Consistent hash
Large pool, want fairness	Random two choices
Geographic distribution	Least time (response-based)

Start with round robin. Move to least connections if you have variable request times. Add health checks always.

Load balancing is the art of spreading work without spreading pain. Choose wisely, monitor constantly.

The Strategies#

Round Robin#

Weighted Round Robin#

Least Connections#

IP Hash (Source Hashing)#

Consistent Hashing#

Least Time (Response-Based)#

Random with Two Choices#

Health Checks#

Passive Health Checks#

Active Health Checks#

Deep Health Checks#

Session Persistence#

Cookie-Based#

Source IP#

Application-Level#

Draining and Graceful Shutdown#

Connection Draining#

Graceful Shutdown#

Layer 4 vs Layer 7#

Layer 4 (TCP/UDP)#

Layer 7 (HTTP)#

Common Mistakes#

1. Ignoring Connection Limits#

2. No Timeouts#

3. Missing X-Forwarded Headers#

4. One Massive Pool#

Choosing a Strategy#

📬 Get the Newsletter