Load Balancing: Distributing Traffic Without Playing Favorites

You’ve scaled horizontally — multiple servers ready to handle requests. Now you need something to decide which server handles each request. That’s load balancing, and the strategy you choose affects latency, reliability, and resource utilization.

Round Robin: The Default

Each server gets requests in rotation: Server 1, Server 2, Server 3, Server 1, Server 2…

1
2
3
4
5
upstream backend {
    server app1:8080;
    server app2:8080;
    server app3:8080;
}

Pros:

Simple to understand and implement
Even distribution over time
No state to maintain

Cons:

Ignores server capacity differences
Ignores current server load
Long-running requests can pile up on one server

Good for: Homogeneous servers with similar request patterns.

Weighted Round Robin

Some servers are more capable than others:

1
2
3
4
5
upstream backend {
    server app1:8080 weight=3;  # Gets 3x traffic
    server app2:8080 weight=2;  # Gets 2x traffic
    server app3:8080 weight=1;  # Gets 1x traffic
}

Use when you have mixed hardware or want to gradually shift traffic during deployments.

Least Connections

Send to the server with fewest active connections:

1
2
3
4
5
6
upstream backend {
    least_conn;
    server app1:8080;
    server app2:8080;
    server app3:8080;
}

Pros:

Adapts to varying request durations
Naturally handles slow servers (they accumulate connections)

Cons:

Requires tracking connection state
New servers get hammered initially (zero connections)

Good for: Requests with variable processing times.

Weighted Least Connections

Combine least connections with capacity weighting:

1
2
3
4
5
upstream backend {
    least_conn;
    server app1:8080 weight=3;
    server app2:8080 weight=1;
}

Server 1 is considered “least loaded” until it has 3x the connections of Server 2.

IP Hash (Session Affinity)

Same client always goes to same server:

1
2
3
4
5
upstream backend {
    ip_hash;
    server app1:8080;
    server app2:8080;
}

Pros:

Client sessions stay on one server
Works with server-side session storage

Cons:

Uneven distribution if traffic comes from few IPs (NAT, proxies)
Server removal disrupts all its clients

Good for: Legacy apps with server-side sessions. But consider: why not use shared session storage instead?

Consistent Hashing

Like IP hash, but smarter about server changes:

1
2
3
4
5
6
upstream backend {
    hash $request_uri consistent;
    server app1:8080;
    server app2:8080;
    server app3:8080;
}

When a server is added/removed, only requests that would go to that server are redistributed. Others stay put.

Good for: Caching layers where you want cache locality.

Random with Two Choices

Pick two servers randomly, send to the one with fewer connections:

1
2
3
def choose_server(servers):
    s1, s2 = random.sample(servers, 2)
    return s1 if s1.connections < s2.connections else s2

Surprisingly effective. Avoids the “thundering herd to least loaded” problem while still adapting to load.

Layer 4 vs Layer 7

Layer 4 (TCP/UDP):

Routes based on IP and port
Very fast, minimal overhead
Can’t inspect HTTP content

Layer 7 (HTTP):

Routes based on HTTP content (path, headers, cookies)
Can modify requests, add headers, terminate SSL
More overhead, more features

Use L4 for raw throughput. Use L7 for smart routing, SSL termination, and HTTP features.

Health Checks

A load balancer is only useful if it knows which servers are healthy:

1
2
3
4
5
6
7
upstream backend {
    server app1:8080;
    server app2:8080;
    
    # Active health checks (nginx plus / other LBs)
    health_check interval=5s fails=3 passes=2;
}

Passive checks: Track response codes from real traffic. Mark unhealthy after N failures.

Active checks: Periodically ping a health endpoint. More proactive but adds traffic.

Most production setups use both:

Active checks catch servers that are up but not receiving traffic
Passive checks catch issues active checks might miss

Connection Draining

When removing a server, don’t drop existing connections:

1
2
3
4
5
6
7
8
# Pseudo-code for graceful removal
def remove_server(server):
    server.accepting_new = False  # Stop new connections
    
    while server.active_connections > 0:
        time.sleep(1)  # Wait for existing to complete
    
    server.shutdown()

Most load balancers support this natively:

1
2
3
4
upstream backend {
    server app1:8080;
    server app2:8080 down;  # Stop new, drain existing
}

SSL Termination

Terminate SSL at the load balancer to:

Offload CPU-intensive crypto from app servers
Centralize certificate management
Enable HTTP-level inspection

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
server {
    listen 443 ssl;
    ssl_certificate /etc/ssl/cert.pem;
    ssl_certificate_key /etc/ssl/key.pem;
    
    location / {
        proxy_pass http://backend;  # Plain HTTP to backends
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

Backend servers need to know the original protocol (for redirect URLs, secure cookies):

1
2
# App server reads X-Forwarded-Proto
is_secure = request.headers.get('X-Forwarded-Proto') == 'https'

Geographic Load Balancing

Route users to the nearest datacenter:

Usually done via DNS (GeoDNS, Route53 latency routing) or anycast.

Common Pitfalls

Ignoring connection limits:

1
2
3
4
5
6
7
8
9
# Bad: backend can only handle 100 connections
upstream backend {
    server app1:8080;  # No limit, LB will overload it
}

# Good: respect backend capacity
upstream backend {
    server app1:8080 max_conns=100;
}

Not preserving client IP:

1
2
3
4
5
6
7
# Backend sees load balancer IP, not client
proxy_pass http://backend;

# Backend sees real client IP
proxy_pass http://backend;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

Single load balancer = single point of failure:

Use multiple LBs with DNS failover, VRRP, or cloud provider redundancy.

Monitoring

Track these metrics:

Request rate per backend
Response time per backend
Error rate per backend
Connection count per backend
Health check status

Uneven metrics indicate:

Misconfigured weights
Health check issues
Application problems on specific servers

Load balancing is the traffic cop of your infrastructure. Round robin works for simple cases. Least connections adapts to varying workloads. Consistent hashing preserves locality. Layer 7 enables smart routing.

Choose the simplest strategy that meets your needs. Add complexity only when metrics show you need it. And always, always have redundant load balancers — the thing distributing your traffic shouldn’t be a single point of failure.

Round Robin: The Default#

Weighted Round Robin#

Least Connections#

Weighted Least Connections#

IP Hash (Session Affinity)#

Consistent Hashing#

Random with Two Choices#

Layer 4 vs Layer 7#

Health Checks#

Connection Draining#

SSL Termination#

Geographic Load Balancing#

Common Pitfalls#

Monitoring#

📬 Get the Newsletter

Round Robin: The Default

Weighted Round Robin

Least Connections

Weighted Least Connections

IP Hash (Session Affinity)

Consistent Hashing

Random with Two Choices

Layer 4 vs Layer 7

Health Checks

Connection Draining

SSL Termination

Geographic Load Balancing

Common Pitfalls

Monitoring