Hardcoded IPs are a maintenance nightmare. Here’s how to let services find each other dynamically.
The Problem# 1
2
3
4
5
6
7
# Bad: Hardcoded
api_url = "http://192.168.1.50:8080"
# What happens when:
# - IP changes?
# - Service moves to new host?
# - You add a second instance?
Service discovery solves this: services register themselves, and clients look them up by name.
DNS-Based Discovery# The simplest approach: use DNS.
Internal DNS# 1
2
3
# /etc/hosts or internal DNS server
192.168.1.50 api.internal
192.168.1.51 database.internal
1
2
# Code uses names
api_url = "http://api.internal:8080"
Pros: Simple, works everywhere
Cons: Manual updates, no health checking, caching issues
DNS with Round-Robin# 1
2
3
api.internal. 60 IN A 192.168.1.50
api.internal. 60 IN A 192.168.1.51
api.internal. 60 IN A 192.168.1.52
DNS returns all IPs, client picks one. Low TTL (60s) allows faster updates.
Kubernetes Service Discovery# Built-in and automatic.
ClusterIP Service# 1
2
3
4
5
6
7
8
9
10
11
apiVersion : v1
kind : Service
metadata :
name : api
namespace : production
spec :
selector :
app : api
ports :
- port : 80
targetPort : 8080
Services are discoverable via DNS:
a p i . p r o d u c t i o n . s v c . c l u s t e r . l o c a l
From Within Pods# 1
2
3
4
5
6
7
import requests
# Same namespace - just use service name
response = requests . get ( "http://api/users" )
# Different namespace - use FQDN
response = requests . get ( "http://api.other-namespace.svc.cluster.local/users" )
Headless Services# For direct pod access (databases, stateful workloads):
1
2
3
4
5
6
7
8
9
10
apiVersion : v1
kind : Service
metadata :
name : database
spec :
clusterIP : None # Headless
selector :
app : postgres
ports :
- port : 5432
DNS returns individual pod IPs instead of a virtual IP.
Consul# HashiCorp’s service mesh and discovery tool.
Register a Service# 1
2
3
4
5
6
7
8
9
10
{
"service" : {
"name" : "api" ,
"port" : 8080 ,
"check" : {
"http" : "http://localhost:8080/health" ,
"interval" : "10s"
}
}
}
1
curl -X PUT -d @service.json http://localhost:8500/v1/agent/service/register
Query Services# 1
2
3
4
5
# DNS interface
dig @127.0.0.1 -p 8600 api.service.consul
# HTTP API
curl http://localhost:8500/v1/health/service/api?passing= true
Consul Template# Auto-update config files when services change:
1
2
3
4
5
upstream api {
{{range service "api" }}
server {{. Address }}: {{. Port }};
{{end }}
}
1
consul-template -template "nginx.ctmpl:nginx.conf:nginx -s reload"
Client-Side Discovery# Client queries registry, picks an instance.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
import consul
c = consul . Consul ()
def get_api_url ():
_ , services = c . health . service ( 'api' , passing = True )
if not services :
raise Exception ( "No healthy api instances" )
# Simple random selection
service = random . choice ( services )
return f "http:// { service [ 'Service' ][ 'Address' ] } : { service [ 'Service' ][ 'Port' ] } "
response = requests . get ( f " { get_api_url () } /users" )
Pros: Client has full control over load balancing
Cons: Every client needs discovery logic
Server-Side Discovery# Load balancer queries registry, routes traffic.
C l i e n t → L R o e a g d i s B ↓ t a r l y a n c e r → S e r v i c e I n s t a n c e
1
2
3
4
5
6
7
8
9
10
11
# nginx with consul-template
upstream api {
server 192.168.1.50 : 8080 ; # Auto-updated
server 192.168.1.51 : 8080 ;
}
server {
location /api/ {
proxy_pass http://api ;
}
}
Pros: Clients stay simple
Cons: Extra hop, load balancer becomes critical
Health Checking# Discovery without health checks serves dead instances.
Passive Health Checks# Track failures from real traffic:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
class ServiceClient :
def __init__ ( self ):
self . failures = defaultdict ( int )
def call ( self , service_name ):
instances = discover ( service_name )
healthy = [ i for i in instances if self . failures [ i ] < 3 ]
instance = random . choice ( healthy or instances )
try :
response = requests . get ( instance , timeout = 5 )
self . failures [ instance ] = 0
return response
except :
self . failures [ instance ] += 1
raise
Active Health Checks# Proactively test instances:
1
2
3
4
5
6
# Consul health check
check :
http : "http://localhost:8080/health"
interval : "10s"
timeout : "2s"
deregister_critical_service_after : "1m"
Unhealthy instances are removed from discovery results.
Common Patterns# Retry with Different Instance# 1
2
3
4
5
6
7
8
9
10
11
def call_with_retry ( service_name , path , max_retries = 3 ):
instances = list ( discover ( service_name ))
random . shuffle ( instances )
for instance in instances [: max_retries ]:
try :
return requests . get ( f " { instance }{ path } " , timeout = 5 )
except RequestException :
continue
raise Exception ( "All instances failed" )
Circuit Breaker# 1
2
3
4
5
6
from circuitbreaker import circuit
@circuit ( failure_threshold = 5 , recovery_timeout = 30 )
def call_api ( path ):
url = get_api_url () # From discovery
return requests . get ( url + path )
Caching Discovery Results# 1
2
3
4
5
6
7
8
9
10
from functools import lru_cache
from time import time
@lru_cache ( maxsize = 100 )
def discover_cached ( service_name , ttl_bucket ):
return discover ( service_name )
def get_instances ( service_name , ttl = 60 ):
bucket = int ( time () / ttl )
return discover_cached ( service_name , bucket )
The Discovery Checklist# When to Use What# Scenario Solution Kubernetes Built-in Services Simple/static DNS Dynamic/multi-DC Consul AWS ECS Service Discovery, Cloud Map Need service mesh Consul Connect, Istio
Start simple. DNS works for most cases. Add complexity when you actually need dynamic discovery, health checking, or cross-datacenter routing.
The best service discovery is the one your developers don’t notice — services just find each other, and failures route around automatically.