You’ve scaled horizontally — multiple servers ready to handle requests. Now you need something to decide which server handles each request. That’s load balancing, and the strategy you choose affects latency, reliability, and resource utilization.
Round Robin: The Default
Each server gets requests in rotation: Server 1, Server 2, Server 3, Server 1, Server 2…
| |
Pros:
- Simple to understand and implement
- Even distribution over time
- No state to maintain
Cons:
- Ignores server capacity differences
- Ignores current server load
- Long-running requests can pile up on one server
Good for: Homogeneous servers with similar request patterns.
Weighted Round Robin
Some servers are more capable than others:
| |
Use when you have mixed hardware or want to gradually shift traffic during deployments.
Least Connections
Send to the server with fewest active connections:
| |
Pros:
- Adapts to varying request durations
- Naturally handles slow servers (they accumulate connections)
Cons:
- Requires tracking connection state
- New servers get hammered initially (zero connections)
Good for: Requests with variable processing times.
Weighted Least Connections
Combine least connections with capacity weighting:
| |
Server 1 is considered “least loaded” until it has 3x the connections of Server 2.
IP Hash (Session Affinity)
Same client always goes to same server:
| |
Pros:
- Client sessions stay on one server
- Works with server-side session storage
Cons:
- Uneven distribution if traffic comes from few IPs (NAT, proxies)
- Server removal disrupts all its clients
Good for: Legacy apps with server-side sessions. But consider: why not use shared session storage instead?
Consistent Hashing
Like IP hash, but smarter about server changes:
| |
When a server is added/removed, only requests that would go to that server are redistributed. Others stay put.
Good for: Caching layers where you want cache locality.
Random with Two Choices
Pick two servers randomly, send to the one with fewer connections:
| |
Surprisingly effective. Avoids the “thundering herd to least loaded” problem while still adapting to load.
Layer 4 vs Layer 7
Layer 4 (TCP/UDP):
- Routes based on IP and port
- Very fast, minimal overhead
- Can’t inspect HTTP content
Layer 7 (HTTP):
- Routes based on HTTP content (path, headers, cookies)
- Can modify requests, add headers, terminate SSL
- More overhead, more features
Use L4 for raw throughput. Use L7 for smart routing, SSL termination, and HTTP features.
Health Checks
A load balancer is only useful if it knows which servers are healthy:
| |
Passive checks: Track response codes from real traffic. Mark unhealthy after N failures.
Active checks: Periodically ping a health endpoint. More proactive but adds traffic.
Most production setups use both:
- Active checks catch servers that are up but not receiving traffic
- Passive checks catch issues active checks might miss
Connection Draining
When removing a server, don’t drop existing connections:
| |
Most load balancers support this natively:
| |
SSL Termination
Terminate SSL at the load balancer to:
- Offload CPU-intensive crypto from app servers
- Centralize certificate management
- Enable HTTP-level inspection
| |
Backend servers need to know the original protocol (for redirect URLs, secure cookies):
| |
Geographic Load Balancing
Route users to the nearest datacenter:
Usually done via DNS (GeoDNS, Route53 latency routing) or anycast.
Common Pitfalls
Ignoring connection limits:
| |
Not preserving client IP:
| |
Single load balancer = single point of failure:
Use multiple LBs with DNS failover, VRRP, or cloud provider redundancy.
Monitoring
Track these metrics:
- Request rate per backend
- Response time per backend
- Error rate per backend
- Connection count per backend
- Health check status
Uneven metrics indicate:
- Misconfigured weights
- Health check issues
- Application problems on specific servers
Load balancing is the traffic cop of your infrastructure. Round robin works for simple cases. Least connections adapts to varying workloads. Consistent hashing preserves locality. Layer 7 enables smart routing.
Choose the simplest strategy that meets your needs. Add complexity only when metrics show you need it. And always, always have redundant load balancers — the thing distributing your traffic shouldn’t be a single point of failure.