Every request to your microservices should pass through a single front door. That door is your API gateway—and getting it right determines whether your architecture scales gracefully or collapses under complexity.

Why API Gateways?

Without a gateway, clients must:

  • Know the location of every service
  • Handle authentication with each service
  • Implement retry logic, timeouts, and circuit breaking
  • Deal with different protocols and response formats

An API gateway centralizes these concerns:

ClientAPIRALTGouirautmatthineitswnifagnoygrmSSSeeerrrvvviiiccceeeABC

Pattern 1: Path-Based Routing

Route requests to different services based on URL path:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# nginx.conf
upstream users_service {
    server users:8080;
}

upstream orders_service {
    server orders:8080;
}

upstream products_service {
    server products:8080;
}

server {
    listen 80;
    
    location /api/users {
        proxy_pass http://users_service;
    }
    
    location /api/orders {
        proxy_pass http://orders_service;
    }
    
    location /api/products {
        proxy_pass http://products_service;
    }
}

With Kong or AWS API Gateway, this becomes declarative:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# kong.yml
services:
  - name: users-service
    url: http://users:8080
    routes:
      - name: users-route
        paths:
          - /api/users
        strip_path: false
        
  - name: orders-service
    url: http://orders:8080
    routes:
      - name: orders-route
        paths:
          - /api/orders

Pattern 2: Authentication Gateway

Validate tokens once at the gateway, pass identity downstream:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
# FastAPI gateway with JWT validation
from fastapi import FastAPI, Request, HTTPException, Depends
from fastapi.security import HTTPBearer
import httpx
import jwt

app = FastAPI()
security = HTTPBearer()

JWT_SECRET = "your-secret-key"
SERVICES = {
    "users": "http://users-service:8080",
    "orders": "http://orders-service:8080",
}

async def verify_token(credentials = Depends(security)):
    try:
        payload = jwt.decode(
            credentials.credentials, 
            JWT_SECRET, 
            algorithms=["HS256"]
        )
        return payload
    except jwt.InvalidTokenError:
        raise HTTPException(status_code=401, detail="Invalid token")

@app.api_route(
    "/{service}/{path:path}", 
    methods=["GET", "POST", "PUT", "DELETE"]
)
async def gateway(
    service: str, 
    path: str, 
    request: Request,
    user: dict = Depends(verify_token)
):
    if service not in SERVICES:
        raise HTTPException(status_code=404, detail="Service not found")
    
    # Forward request with user identity header
    async with httpx.AsyncClient() as client:
        response = await client.request(
            method=request.method,
            url=f"{SERVICES[service]}/{path}",
            headers={
                "X-User-ID": user["sub"],
                "X-User-Email": user.get("email", ""),
            },
            content=await request.body(),
        )
    
    return Response(
        content=response.content,
        status_code=response.status_code,
        headers=dict(response.headers)
    )

Pattern 3: Rate Limiting

Protect services from overload with token bucket or sliding window:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# Redis-based rate limiting
import redis
import time

redis_client = redis.Redis(host='localhost', port=6379)

def rate_limit(key: str, limit: int, window: int) -> bool:
    """
    Sliding window rate limiter.
    Returns True if request is allowed, False if rate limited.
    """
    now = time.time()
    window_start = now - window
    
    pipe = redis_client.pipeline()
    
    # Remove old entries
    pipe.zremrangebyscore(key, 0, window_start)
    
    # Count requests in window
    pipe.zcard(key)
    
    # Add current request
    pipe.zadd(key, {str(now): now})
    
    # Set expiry
    pipe.expire(key, window)
    
    results = pipe.execute()
    request_count = results[1]
    
    return request_count < limit

# Usage in gateway
@app.middleware("http")
async def rate_limit_middleware(request: Request, call_next):
    client_ip = request.client.host
    key = f"rate_limit:{client_ip}"
    
    if not rate_limit(key, limit=100, window=60):
        return JSONResponse(
            status_code=429,
            content={"error": "Rate limit exceeded"},
            headers={"Retry-After": "60"}
        )
    
    return await call_next(request)

Pattern 4: Request/Response Transformation

Adapt between client expectations and service implementations:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# Transform legacy API responses to new format
@app.get("/api/v2/users/{user_id}")
async def get_user_v2(user_id: str):
    # Call legacy v1 service
    async with httpx.AsyncClient() as client:
        response = await client.get(
            f"http://legacy-users:8080/users/{user_id}"
        )
        legacy_data = response.json()
    
    # Transform to v2 format
    return {
        "id": legacy_data["userId"],
        "email": legacy_data["emailAddress"],
        "name": {
            "first": legacy_data["firstName"],
            "last": legacy_data["lastName"],
        },
        "created_at": legacy_data["createdDate"],
        "metadata": {
            "source": "legacy-transform",
            "version": "2.0"
        }
    }

Pattern 5: Backend for Frontend (BFF)

Create specialized gateways for different clients:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# Mobile BFF - optimized for bandwidth
@app.get("/mobile/dashboard")
async def mobile_dashboard(user: dict = Depends(verify_token)):
    async with httpx.AsyncClient() as client:
        # Parallel requests to multiple services
        user_task = client.get(f"{SERVICES['users']}/users/{user['sub']}")
        orders_task = client.get(
            f"{SERVICES['orders']}/users/{user['sub']}/recent?limit=5"
        )
        
        user_resp, orders_resp = await asyncio.gather(
            user_task, orders_task
        )
    
    # Aggregate and minimize payload
    return {
        "user": {
            "name": user_resp.json()["name"],
            "avatar": user_resp.json()["avatarUrl"],
        },
        "recentOrders": [
            {"id": o["id"], "total": o["total"], "status": o["status"]}
            for o in orders_resp.json()["orders"]
        ]
    }

# Web BFF - richer data
@app.get("/web/dashboard")
async def web_dashboard(user: dict = Depends(verify_token)):
    # Return full data for web clients
    ...

Production Checklist

Before deploying your gateway:

  • Health checks — Gateway should verify backend health
  • Timeouts — Set reasonable timeouts per service
  • Circuit breakers — Fail fast when services are down
  • Logging — Structured logs with request IDs
  • Metrics — Latency, error rates, throughput per route
  • TLS termination — Handle HTTPS at the gateway
  • CORS — Configure cross-origin policies centrally
  • Request validation — Validate before forwarding

Conclusion

An API gateway isn’t just a reverse proxy—it’s the control plane for your entire API surface. Get routing, authentication, and rate limiting right at the gateway, and your services can focus on business logic instead of infrastructure concerns.

Start simple with path-based routing and authentication. Add rate limiting when you need protection. Introduce BFF patterns when client needs diverge significantly.

Your gateway is the first thing clients see. Make it reliable.