Service Discovery: Finding Services Without Hardcoding

Hardcoded IPs are a maintenance nightmare. Here’s how to let services find each other dynamically.

The Problem

1
2
3
4
5
6
7
# Bad: Hardcoded
api_url = "http://192.168.1.50:8080"

# What happens when:
# - IP changes?
# - Service moves to new host?
# - You add a second instance?

Service discovery solves this: services register themselves, and clients look them up by name.

DNS-Based Discovery

The simplest approach: use DNS.

Internal DNS

1
2
3
# /etc/hosts or internal DNS server
192.168.1.50  api.internal
192.168.1.51  database.internal

1
2
# Code uses names
api_url = "http://api.internal:8080"

Pros: Simple, works everywhere Cons: Manual updates, no health checking, caching issues

DNS with Round-Robin

1
2
3
api.internal.  60  IN  A  192.168.1.50
api.internal.  60  IN  A  192.168.1.51
api.internal.  60  IN  A  192.168.1.52

DNS returns all IPs, client picks one. Low TTL (60s) allows faster updates.

Kubernetes Service Discovery

Built-in and automatic.

ClusterIP Service

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
apiVersion: v1
kind: Service
metadata:
  name: api
  namespace: production
spec:
  selector:
    app: api
  ports:
    - port: 80
      targetPort: 8080

Services are discoverable via DNS:

From Within Pods

1
2
3
4
5
6
7
import requests

# Same namespace - just use service name
response = requests.get("http://api/users")

# Different namespace - use FQDN
response = requests.get("http://api.other-namespace.svc.cluster.local/users")

Headless Services

For direct pod access (databases, stateful workloads):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
apiVersion: v1
kind: Service
metadata:
  name: database
spec:
  clusterIP: None  # Headless
  selector:
    app: postgres
  ports:
    - port: 5432

DNS returns individual pod IPs instead of a virtual IP.

Consul

HashiCorp’s service mesh and discovery tool.

Register a Service

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
{
  "service": {
    "name": "api",
    "port": 8080,
    "check": {
      "http": "http://localhost:8080/health",
      "interval": "10s"
    }
  }
}

1
curl -X PUT -d @service.json http://localhost:8500/v1/agent/service/register

Query Services

1
2
3
4
5
# DNS interface
dig @127.0.0.1 -p 8600 api.service.consul

# HTTP API
curl http://localhost:8500/v1/health/service/api?passing=true

Consul Template

Auto-update config files when services change:

1
2
3
4
5
upstream api {
{{range service "api"}}
  server {{.Address}}:{{.Port}};
{{end}}
}

1
consul-template -template "nginx.ctmpl:nginx.conf:nginx -s reload"

Client-Side Discovery

Client queries registry, picks an instance.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
import consul

c = consul.Consul()

def get_api_url():
    _, services = c.health.service('api', passing=True)
    if not services:
        raise Exception("No healthy api instances")
    
    # Simple random selection
    service = random.choice(services)
    return f"http://{service['Service']['Address']}:{service['Service']['Port']}"

response = requests.get(f"{get_api_url()}/users")

Pros: Client has full control over load balancing Cons: Every client needs discovery logic

Server-Side Discovery

Load balancer queries registry, routes traffic.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# nginx with consul-template
upstream api {
    server 192.168.1.50:8080;  # Auto-updated
    server 192.168.1.51:8080;
}

server {
    location /api/ {
        proxy_pass http://api;
    }
}

Pros: Clients stay simple Cons: Extra hop, load balancer becomes critical

Health Checking

Discovery without health checks serves dead instances.

Passive Health Checks

Track failures from real traffic:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
class ServiceClient:
    def __init__(self):
        self.failures = defaultdict(int)
        
    def call(self, service_name):
        instances = discover(service_name)
        healthy = [i for i in instances if self.failures[i] < 3]
        
        instance = random.choice(healthy or instances)
        try:
            response = requests.get(instance, timeout=5)
            self.failures[instance] = 0
            return response
        except:
            self.failures[instance] += 1
            raise

Active Health Checks

Proactively test instances:

1
2
3
4
5
6
# Consul health check
check:
  http: "http://localhost:8080/health"
  interval: "10s"
  timeout: "2s"
  deregister_critical_service_after: "1m"

Unhealthy instances are removed from discovery results.

Common Patterns

Retry with Different Instance

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
def call_with_retry(service_name, path, max_retries=3):
    instances = list(discover(service_name))
    random.shuffle(instances)
    
    for instance in instances[:max_retries]:
        try:
            return requests.get(f"{instance}{path}", timeout=5)
        except RequestException:
            continue
    
    raise Exception("All instances failed")

Circuit Breaker

1
2
3
4
5
6
from circuitbreaker import circuit

@circuit(failure_threshold=5, recovery_timeout=30)
def call_api(path):
    url = get_api_url()  # From discovery
    return requests.get(url + path)

Caching Discovery Results

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
from functools import lru_cache
from time import time

@lru_cache(maxsize=100)
def discover_cached(service_name, ttl_bucket):
    return discover(service_name)

def get_instances(service_name, ttl=60):
    bucket = int(time() / ttl)
    return discover_cached(service_name, bucket)

The Discovery Checklist

Services register on startup
Services deregister on shutdown
Health checks configured
Clients handle missing instances
Retry logic with different instances
Reasonable caching (not too long)
Monitoring for registration failures

When to Use What

Scenario	Solution
Kubernetes	Built-in Services
Simple/static	DNS
Dynamic/multi-DC	Consul
AWS	ECS Service Discovery, Cloud Map
Need service mesh	Consul Connect, Istio

Start simple. DNS works for most cases. Add complexity when you actually need dynamic discovery, health checking, or cross-datacenter routing.

The best service discovery is the one your developers don’t notice — services just find each other, and failures route around automatically.

The Problem#

DNS-Based Discovery#

Internal DNS#

DNS with Round-Robin#

Kubernetes Service Discovery#

ClusterIP Service#

From Within Pods#

Headless Services#

Consul#

Register a Service#

Query Services#

Consul Template#

Client-Side Discovery#

Server-Side Discovery#

Health Checking#

Passive Health Checks#

Active Health Checks#

Common Patterns#

Retry with Different Instance#

Circuit Breaker#

Caching Discovery Results#

The Discovery Checklist#

When to Use What#

📬 Get the Newsletter