LLM API Integration Patterns: Building Reliable AI-Powered Features

Integrating LLM APIs into production systems is harder than the tutorials suggest. The API call works in development. Then you hit rate limits, latency spikes, context limits, and costs that scale faster than your revenue. Here’s how to build LLM integrations that actually work. The Basics Nobody Mentions Always Stream Non-streaming API calls block until complete. For a 500-token response, that’s 5-15 seconds of your user staring at nothing. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 # Bad: User waits forever response = client.chat.completions.create( model="gpt-4", messages=[{"role": "user", "content": prompt}] ) print(response.choices[0].message.content) # Good: Tokens appear as generated stream = client.chat.completions.create( model="gpt-4", messages=[{"role": "user", "content": prompt}], stream=True ) for chunk in stream: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="", flush=True) Streaming also lets you abort early if the response is going off-rails. ...

March 12, 2026 · 7 min · 1350 words · Rob Washington

Practical LLM Integration Patterns

You want to add LLM capabilities to your application. Not build a chatbot — actually integrate AI into your product. Here are the patterns that work. The Naive Approach (And Why It Fails) 1 2 3 4 5 6 def process_user_input(text): response = openai.chat.completions.create( model="gpt-4", messages=[{"role": "user", "content": text}] ) return response.choices[0].message.content Problems: No error handling No rate limiting No caching No fallbacks No cost control Prompt injection vulnerable Let’s fix each one. Pattern 1: The Robust Client Wrap your LLM calls in a proper client: ...

March 11, 2026 · 6 min · 1179 words · Rob Washington

Webhook Patterns: Receiving Events Reliably

Webhooks flip the API model: instead of polling, the service calls you. Here’s how to handle them without losing events. Basic Webhook Handler 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 from fastapi import FastAPI, Request, HTTPException import hmac import hashlib app = FastAPI() @app.post("/webhooks/stripe") async def stripe_webhook(request: Request): payload = await request.body() sig_header = request.headers.get("Stripe-Signature") # Verify signature first if not verify_stripe_signature(payload, sig_header): raise HTTPException(status_code=400, detail="Invalid signature") event = json.loads(payload) # Process event if event["type"] == "payment_intent.succeeded": handle_payment_success(event["data"]["object"]) # Always return 200 quickly return {"received": True} Signature Verification Never trust unverified webhooks. ...

February 28, 2026 · 5 min · 889 words · Rob Washington

Webhook Design Patterns: Building Reliable Event-Driven Integrations

Webhooks flip the API model: instead of polling for changes, services push events to you. GitHub notifies you of commits. Stripe tells you about payments. Twilio delivers SMS receipts. The concept is simple. Production-grade implementation requires care. Basic Webhook Receiver 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 from fastapi import FastAPI, Request, HTTPException app = FastAPI() @app.post("/webhooks/stripe") async def stripe_webhook(request: Request): payload = await request.json() event_type = payload.get("type") if event_type == "payment_intent.succeeded": handle_payment_success(payload["data"]["object"]) elif event_type == "payment_intent.failed": handle_payment_failure(payload["data"]["object"]) return {"status": "received"} This works for demos. Production needs more. ...

February 23, 2026 · 6 min · 1136 words · Rob Washington

LLM API Integration Patterns: Building Resilient AI Applications

Integrating Large Language Model APIs into production applications requires more than just calling an endpoint. Here are battle-tested patterns for building resilient, cost-effective LLM integrations. The Retry Cascade LLM APIs are notorious for rate limits and transient failures. A simple exponential backoff isn’t enough — you need a cascade strategy: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 import asyncio from dataclasses import dataclass from typing import Optional @dataclass class LLMResponse: content: str model: str tokens_used: int class LLMCascade: def __init__(self): self.providers = [ ("anthropic", "claude-sonnet-4-20250514", 3), ("openai", "gpt-4o", 2), ("anthropic", "claude-3-haiku-20240307", 5), ] async def complete(self, prompt: str) -> Optional[LLMResponse]: for provider, model, max_retries in self.providers: for attempt in range(max_retries): try: return await self._call_provider(provider, model, prompt) except RateLimitError: await asyncio.sleep(2 ** attempt) except ProviderError: break # Try next provider return None The cascade falls through primary to fallback models, attempting retries at each level before moving on. ...

February 17, 2026 · 5 min · 898 words · Rob Washington

Webhook Patterns: Building Reliable Event-Driven Integrations

Webhooks seem simple: receive an HTTP POST, do something. In practice, they’re a minefield of security issues, reliability problems, and edge cases. Let’s build webhooks that actually work in production. Receiving Webhooks Basic Receiver 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 from fastapi import FastAPI, Request, HTTPException, Header from typing import Optional import hmac import hashlib app = FastAPI() WEBHOOK_SECRET = "your-webhook-secret" @app.post("/webhooks/stripe") async def stripe_webhook( request: Request, stripe_signature: Optional[str] = Header(None, alias="Stripe-Signature") ): payload = await request.body() # Verify signature if not verify_stripe_signature(payload, stripe_signature, WEBHOOK_SECRET): raise HTTPException(status_code=401, detail="Invalid signature") event = await request.json() # Process event event_type = event.get("type") if event_type == "payment_intent.succeeded": await handle_payment_success(event["data"]["object"]) elif event_type == "customer.subscription.deleted": await handle_subscription_cancelled(event["data"]["object"]) return {"received": True} def verify_stripe_signature(payload: bytes, signature: str, secret: str) -> bool: """Verify Stripe webhook signature.""" if not signature: return False # Parse signature header elements = dict(item.split("=") for item in signature.split(",")) timestamp = elements.get("t") expected_sig = elements.get("v1") # Compute expected signature signed_payload = f"{timestamp}.{payload.decode()}" computed_sig = hmac.new( secret.encode(), signed_payload.encode(), hashlib.sha256 ).hexdigest() return hmac.compare_digest(computed_sig, expected_sig) Idempotent Processing Webhooks can be delivered multiple times. Make your handlers idempotent: ...

February 11, 2026 · 7 min · 1378 words · Rob Washington