Production

Feature Flags for AI Features: Rolling Out the Unpredictable

Traditional feature flags are straightforward: flip a boolean, show a button. AI features are messier. The output varies. Costs scale non-linearly. User expectations are unclear. And when it breaks, it doesn’t throw a clean error—it confidently gives wrong answers. Here’s how to think about feature flags when the feature itself is probabilistic. The Problem With Standard Rollouts When you ship a new checkout button, you can test it. Click, observe, done. If 5% of users get the new button and it breaks, you know immediately. ...

LLM API Patterns for Production Systems

Building toy demos with LLM APIs is easy. Building production systems that handle real traffic, fail gracefully, and don’t bankrupt you? That’s where it gets interesting. The Reality of Production LLM Integration Most tutorials show you curl to an API and celebrate. Real systems need to handle: API rate limits and throttling Transient failures and retries Cost explosion from runaway loops Latency variance (100ms to 30s responses) Model version changes breaking prompts Token limits exceeding input size Let’s look at patterns that actually work. ...

Building Reliable LLM-Powered Features in Production

Adding an LLM to your application is easy. Making it reliable enough for production is another story. API timeouts, rate limits, hallucinations, and surprise $500 invoices await the unprepared. Here’s how to build LLM features that actually work. The Basics: Robust API Calls Never call an LLM API without proper error handling: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 import anthropic from tenacity import retry, stop_after_attempt, wait_exponential import time client = anthropic.Anthropic() @retry( stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=30), reraise=True ) def call_llm(prompt: str, max_tokens: int = 1024) -> str: try: response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=max_tokens, messages=[{"role": "user", "content": prompt}] ) return response.content[0].text except anthropic.RateLimitError: time.sleep(60) # Back off on rate limits raise except anthropic.APIStatusError as e: if e.status_code >= 500: raise # Retry on server errors raise # Don't retry on client errors (4xx) Timeouts Are Non-Negotiable LLM calls can hang. Always set timeouts: ...

Debugging Production Issues Without Breaking Things

Production is sacred. When something breaks, you need to investigate without making it worse. Here’s how. Rule Zero: Don’t Make It Worse Before touching anything: Don’t restart services until you understand the problem Don’t deploy fixes without knowing the root cause Don’t clear logs you might need for investigation Don’t scale down what might be handling load Stabilize first, investigate second, fix third. Start With Observability Check Dashboards Before SSH-ing anywhere: ...