Architecture

LLM API Integration Patterns for Production Applications

Integrating LLMs into production applications is deceptively simple. Call an API, get text back. But building reliable, cost-effective systems requires more thought. Here are patterns that work at scale. The Basic Call Every LLM integration starts here: 1 2 3 4 5 6 7 8 import openai def complete(prompt: str) -> str: response = openai.chat.completions.create( model="gpt-4", messages=[{"role": "user", "content": prompt}] ) return response.choices[0].message.content This works for prototypes. Production needs more. Retry with Exponential Backoff LLM APIs have rate limits and occasional failures: ...

Idempotency: Making APIs Safe to Retry

Networks fail. Clients timeout. Users double-click. If your API creates duplicate orders or charges cards twice when this happens, you have a problem. Idempotency is the solution—making operations safe to retry without side effects. What Is Idempotency? An operation is idempotent if performing it multiple times has the same effect as performing it once. # G P D # P P E U E O O I T T L N S S d E O T T e / / T T m u u E p s s i u o e e / d r s t r r u e d e e s s s m e r n / / e p r s t 1 1 r o s / : 2 2 s t 1 3 3 / e 2 S 1 n 3 a { 2 t / m . 3 : c e . h . D a r } i r e f g s f e u e l r t # # # e # # n e A A U t C C v l l s r h e w w e r e a r a a r e a r y y y s t g s s 1 u e e t 2 l s s i r s 3 t m e e a t e t t i e h u s s a N e r c E n u g h W c s s o a e n t o r u r e i r d s m d e 1 ( e e a r 2 a r g 3 l a 1 r e i 2 t e a n 3 o a c d h e t y a h t c i g i h s o m n e t s e i t m a = e t e s t i l l g o n e ) The Problem Client sends request → Server processes it → Response lost in transit: ...

API Pagination Patterns: Offset, Cursor, and Keyset

Every API that returns lists needs pagination. Without it, a request for “all users” could return millions of rows, crushing your database and timing out the client. But pagination has tradeoffs—and choosing wrong can hurt performance or cause data inconsistencies. Offset Pagination The classic approach. Simple to implement, simple to understand: G G G E E E T T T / / / u u u s s s e e e r r r s s s ? ? ? l l l i i i m m m i i i t t t = = = 2 2 2 0 0 0 & & & o o o f f f f f f s s s e e e t t t = = = 0 2 4 0 0 # # # F S T i e h r c i s o r t n d d p p a p a g a g e g e e 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 @app.get("/users") def list_users(limit: int = 20, offset: int = 0): users = db.query( "SELECT * FROM users ORDER BY id LIMIT %s OFFSET %s", (limit, offset) ) total = db.query("SELECT COUNT(*) FROM users")[0][0] return { "data": users, "pagination": { "limit": limit, "offset": offset, "total": total } } Pros: ...

API Error Handling That Helps Instead of Frustrates

Bad error handling wastes everyone’s time. A cryptic “Error 500” sends developers on a debugging odyssey. A well-designed error response tells them exactly what went wrong and how to fix it. Here’s how to build the latter. The Anatomy of a Good Error Every error response should answer three questions: What happened? (error code/type) Why? (human-readable message) How do I fix it? (actionable guidance) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 { "error": { "code": "VALIDATION_ERROR", "message": "Request validation failed", "details": [ { "field": "email", "message": "Invalid email format", "received": "not-an-email" }, { "field": "age", "message": "Must be a positive integer", "received": "-5" } ], "documentation_url": "https://api.example.com/docs/errors#VALIDATION_ERROR" }, "request_id": "req_abc123" } Always include: ...

Rate Limiting Strategies That Protect Without Frustrating

Rate limiting is the bouncer at your API’s door. Too strict, and legitimate users get frustrated. Too loose, and one bad actor can take down your service. Here’s how to find the balance. Why Rate Limit? Without limits, a single client can: Exhaust your database connections Burn through your third-party API quotas Inflate your cloud bill Deny service to everyone else Rate limiting isn’t about being restrictive—it’s about being fair. ...

Background Job Patterns That Actually Scale

Every production system eventually needs background jobs. Email notifications, report generation, data syncing, webhook processing—the work that can’t (or shouldn’t) happen during a user request. Here’s what I’ve learned about making them reliable. The Naive Approach (And Why It Breaks) Most developers start with something like this: 1 2 3 4 5 @app.route('/signup') def signup(): user = create_user(request.form) send_welcome_email(user) # Blocks the response return redirect('/dashboard') This works until it doesn’t. The email service has a 5-second timeout. Now your signup page feels broken. Or the email service is down, and signups fail entirely. ...

Circuit Breakers: Fail Fast, Recover Gracefully

When a downstream service is failing, continuing to call it makes everything worse. Circuit breakers stop the cascade. The Pattern Three states: Closed: Normal operation, requests pass through Open: Service is failing, requests fail immediately Half-Open: Testing if service recovered [ C L ┌ │ ▼ O ▲ │ └ ─ S ─ ─ E ─ ─ D ─ ─ ] ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ f ─ ─ a ─ ─ i ─ ─ l s ─ u u ─ r c ─ e c ─ e ─ t s ─ h s ─ r ─ ─ e ─ ─ s ─ ─ h ─ ─ o ─ ─ l ─ ─ d ─ ─ ─ ─ ─ ─ ─ ─ ▶ ─ ─ ─ ─ [ ─ ─ O ─ ─ P │ │ ┴ ─ E ─ ─ N ─ ─ ] f ─ a ─ ─ i ─ ─ l ─ t u ─ i r ─ m e ─ e ─ ┐ │ │ o ─ u ─ t ─ ─ ─ ─ │ │ ┘ ▶ [ H A L F - O P E N ] Basic Implementation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 import time from enum import Enum from threading import Lock class State(Enum): CLOSED = "closed" OPEN = "open" HALF_OPEN = "half_open" class CircuitBreaker: def __init__( self, failure_threshold: int = 5, recovery_timeout: int = 30, half_open_max_calls: int = 3 ): self.failure_threshold = failure_threshold self.recovery_timeout = recovery_timeout self.half_open_max_calls = half_open_max_calls self.state = State.CLOSED self.failure_count = 0 self.success_count = 0 self.last_failure_time = None self.lock = Lock() def can_execute(self) -> bool: with self.lock: if self.state == State.CLOSED: return True if self.state == State.OPEN: if time.time() - self.last_failure_time > self.recovery_timeout: self.state = State.HALF_OPEN self.success_count = 0 return True return False if self.state == State.HALF_OPEN: return self.success_count < self.half_open_max_calls return False def record_success(self): with self.lock: if self.state == State.HALF_OPEN: self.success_count += 1 if self.success_count >= self.half_open_max_calls: self.state = State.CLOSED self.failure_count = 0 else: self.failure_count = 0 def record_failure(self): with self.lock: self.failure_count += 1 self.last_failure_time = time.time() if self.state == State.HALF_OPEN: self.state = State.OPEN elif self.failure_count >= self.failure_threshold: self.state = State.OPEN Using the Circuit Breaker 1 2 3 4 5 6 7 8 9 10 11 12 13 payment_breaker = CircuitBreaker(failure_threshold=3, recovery_timeout=60) def process_payment(order): if not payment_breaker.can_execute(): raise ServiceUnavailable("Payment service circuit open") try: result = payment_service.charge(order) payment_breaker.record_success() return result except Exception as e: payment_breaker.record_failure() raise Decorator Pattern 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 from functools import wraps def circuit_breaker(breaker: CircuitBreaker): def decorator(func): @wraps(func) def wrapper(*args, **kwargs): if not breaker.can_execute(): raise CircuitOpenError(f"Circuit breaker open for {func.__name__}") try: result = func(*args, **kwargs) breaker.record_success() return result except Exception as e: breaker.record_failure() raise return wrapper return decorator # Usage payment_cb = CircuitBreaker() @circuit_breaker(payment_cb) def charge_customer(customer_id, amount): return payment_api.charge(customer_id, amount) With Fallback 1 2 3 4 5 6 7 8 9 10 11 12 def get_user_recommendations(user_id): if not recommendations_breaker.can_execute(): # Fallback to cached or default recommendations return get_cached_recommendations(user_id) or DEFAULT_RECOMMENDATIONS try: result = recommendations_service.get(user_id) recommendations_breaker.record_success() return result except Exception: recommendations_breaker.record_failure() return get_cached_recommendations(user_id) or DEFAULT_RECOMMENDATIONS Library: pybreaker 1 2 3 4 5 6 7 8 9 10 11 12 13 import pybreaker db_breaker = pybreaker.CircuitBreaker( fail_max=5, reset_timeout=30 ) @db_breaker def query_database(sql): return db.execute(sql) # Check state print(db_breaker.current_state) # 'closed', 'open', or 'half-open' Library: tenacity (with circuit breaker) 1 2 3 4 5 6 7 8 from tenacity import retry, stop_after_attempt, CircuitBreaker cb = CircuitBreaker(failure_threshold=3, recovery_time=60) @retry(stop=stop_after_attempt(3)) @cb def call_external_api(): return requests.get("https://api.example.com/data") Per-Service Breakers 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 class ServiceRegistry: def __init__(self): self.breakers = {} def get_breaker(self, service_name: str) -> CircuitBreaker: if service_name not in self.breakers: self.breakers[service_name] = CircuitBreaker() return self.breakers[service_name] registry = ServiceRegistry() def call_service(service_name: str, endpoint: str): breaker = registry.get_breaker(service_name) if not breaker.can_execute(): raise ServiceUnavailable(f"{service_name} circuit is open") try: result = http_client.get(f"http://{service_name}/{endpoint}") breaker.record_success() return result except Exception: breaker.record_failure() raise Monitoring 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 from prometheus_client import Counter, Gauge circuit_state = Gauge( 'circuit_breaker_state', 'Circuit breaker state (0=closed, 1=open, 2=half-open)', ['service'] ) circuit_failures = Counter( 'circuit_breaker_failures_total', 'Circuit breaker failure count', ['service'] ) circuit_rejections = Counter( 'circuit_breaker_rejections_total', 'Requests rejected by open circuit', ['service'] ) # Update metrics in circuit breaker def record_failure(self, service_name): circuit_failures.labels(service=service_name).inc() # ... rest of failure logic circuit_state.labels(service=service_name).set(self.state.value) Configuration Guidelines Scenario Threshold Timeout Critical service, fast recovery 3-5 failures 15-30s Non-critical, can wait 5-10 failures 60-120s Flaky external API 3 failures 30-60s Database 5 failures 30s Anti-Patterns 1. Single global breaker ...

Caching Strategies: What to Cache and When to Invalidate

Cache invalidation is one of the two hard problems in computer science. Here’s how to make it less painful. The Caching Patterns Cache-Aside (Lazy Loading) 1 2 3 4 5 6 7 8 9 10 11 12 13 def get_user(user_id: str) -> dict: # Check cache first cached = redis.get(f"user:{user_id}") if cached: return json.loads(cached) # Cache miss: fetch from database user = db.query("SELECT * FROM users WHERE id = %s", user_id) # Store in cache for next time redis.setex(f"user:{user_id}", 3600, json.dumps(user)) return user Pros: Only caches what’s actually used Cons: First request always slow (cache miss) ...

Feature Flags: Deploy Doesn't Mean Release

Separating deployment from release is one of the best things you can do for your team’s sanity. Feature flags make this possible. The Core Idea 1 2 3 4 5 6 7 8 9 10 11 12 # Without flags: deploy = release def checkout(): process_payment() send_confirmation() # With flags: deploy != release def checkout(): process_payment() if feature_enabled("new_confirmation_email"): send_new_confirmation() # Deployed but not released else: send_confirmation() Code ships to production. Flag decides if users see it. ...

The Twelve-Factor App: What Actually Matters

The Twelve-Factor methodology is from 2011 but remains relevant. Here’s what matters in practice, and what’s become outdated. The Factors, Ranked by Impact Critical (Ignore at Your Peril) III. Config in Environment 1 2 3 4 5 # Bad DATABASE_URL = "postgres://localhost/myapp" # hardcoded # Good DATABASE_URL = os.environ["DATABASE_URL"] Config includes credentials, per-environment values, and feature flags. Environment variables work everywhere: containers, serverless, bare metal. VI. Stateless Processes 1 2 3 4 5 6 7 8 9 10 11 # Bad: storing session in memory sessions = {} @app.post("/login") def login(user): sessions[user.id] = {"logged_in": True} # Dies with process # Good: external session store @app.post("/login") def login(user): redis.set(f"session:{user.id}", {"logged_in": True}) If your process dies, can another pick up the work? Statelessness enables horizontal scaling, rolling deploys, and crash recovery. ...