LLM API Integration Patterns: Building Reliable AI-Powered Features

Integrating LLM APIs into production systems is harder than the tutorials suggest. The API call works in development. Then you hit rate limits, latency spikes, context limits, and costs that scale faster than your revenue. Here’s how to build LLM integrations that actually work. The Basics Nobody Mentions Always Stream Non-streaming API calls block until complete. For a 500-token response, that’s 5-15 seconds of your user staring at nothing. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 # Bad: User waits forever response = client.chat.completions.create( model="gpt-4", messages=[{"role": "user", "content": prompt}] ) print(response.choices[0].message.content) # Good: Tokens appear as generated stream = client.chat.completions.create( model="gpt-4", messages=[{"role": "user", "content": prompt}], stream=True ) for chunk in stream: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="", flush=True) Streaming also lets you abort early if the response is going off-rails. ...

March 12, 2026 Â· 7 min Â· 1350 words Â· Rob Washington

AI Coding Assistants: A Practical Guide to Actually Using Them Well

Everyone has access to AI coding assistants now. Most people use them poorly. Here’s how to actually get value from them. The Mental Model Shift Stop thinking of AI assistants as “autocomplete on steroids.” Think of them as a junior developer who: Has read every Stack Overflow answer ever written Types infinitely fast Never gets tired or annoyed Has no memory of what you discussed 5 minutes ago Will confidently produce plausible-looking nonsense That last point is crucial. These tools don’t know things. They predict likely tokens. The output often looks right even when it’s wrong. ...

March 12, 2026 Â· 9 min Â· 1750 words Â· Rob Washington

Practical LLM Integration Patterns

You want to add LLM capabilities to your application. Not build a chatbot — actually integrate AI into your product. Here are the patterns that work. The Naive Approach (And Why It Fails) 1 2 3 4 5 6 def process_user_input(text): response = openai.chat.completions.create( model="gpt-4", messages=[{"role": "user", "content": text}] ) return response.choices[0].message.content Problems: No error handling No rate limiting No caching No fallbacks No cost control Prompt injection vulnerable Let’s fix each one. Pattern 1: The Robust Client Wrap your LLM calls in a proper client: ...

March 11, 2026 Â· 6 min Â· 1179 words Â· Rob Washington

Feature Flags for AI Features: Rolling Out the Unpredictable

Traditional feature flags are straightforward: flip a boolean, show a button. AI features are messier. The output varies. Costs scale non-linearly. User expectations are unclear. And when it breaks, it doesn’t throw a clean error—it confidently gives wrong answers. Here’s how to think about feature flags when the feature itself is probabilistic. The Problem With Standard Rollouts When you ship a new checkout button, you can test it. Click, observe, done. If 5% of users get the new button and it breaks, you know immediately. ...

March 9, 2026 Â· 5 min Â· 1039 words Â· Rob Washington

Building Voice AI Assistants with VAPI: From Setup to Production

Voice AI has matured significantly. VAPI makes it straightforward to build voice assistants that can actually do things—not just chat, but call APIs, look up data, and take actions. Why VAPI? VAPI handles the hard parts of voice: Speech-to-text transcription LLM integration (OpenAI, Anthropic, custom) Text-to-speech with natural voices (ElevenLabs, etc.) Real-time streaming for low latency Tool/function calling during conversations You focus on what your assistant does. VAPI handles how it speaks and listens. ...

March 7, 2026 Â· 5 min Â· 1039 words Â· Rob Washington

LLM API Patterns for Production Systems

Building toy demos with LLM APIs is easy. Building production systems that handle real traffic, fail gracefully, and don’t bankrupt you? That’s where it gets interesting. The Reality of Production LLM Integration Most tutorials show you curl to an API and celebrate. Real systems need to handle: API rate limits and throttling Transient failures and retries Cost explosion from runaway loops Latency variance (100ms to 30s responses) Model version changes breaking prompts Token limits exceeding input size Let’s look at patterns that actually work. ...

March 6, 2026 Â· 5 min Â· 1000 words Â· Rob Washington

Integrating LLM APIs: Practical Patterns for Production

LLM APIs are straightforward to call but tricky to use well in production. Here’s what I’ve learned integrating them into real systems. Basic API Calls OpenAI 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 import openai client = openai.OpenAI(api_key="sk-...") response = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain kubernetes in one sentence."} ], max_tokens=100, temperature=0.7 ) print(response.choices[0].message.content) Anthropic (Claude) 1 2 3 4 5 6 7 8 9 10 11 12 13 import anthropic client = anthropic.Anthropic(api_key="sk-ant-...") response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, messages=[ {"role": "user", "content": "Explain kubernetes in one sentence."} ] ) print(response.content[0].text) curl (Any Provider) 1 2 3 4 5 6 7 curl https://api.openai.com/v1/chat/completions \ -H "Authorization: Bearer $OPENAI_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}] }' Streaming Responses For better UX, stream tokens as they arrive: ...

March 5, 2026 Â· 5 min Â· 1037 words Â· Rob Washington

Integrating AI Agents into DevOps Workflows

The line between AI coding assistants and DevOps automation is blurring. What started as autocomplete has evolved into agents that can review PRs, triage alerts, and even execute runbooks. Here’s how teams are integrating AI agents into their workflows—and where the sharp edges still are. The Spectrum of AI in DevOps Think of AI integration as a spectrum from passive to active: Passive (Safe to Start) Code suggestions during development Documentation generation Log summarization Semi-Active (Human Approval) ...

March 5, 2026 Â· 5 min Â· 995 words Â· Rob Washington

Building Reliable LLM-Powered Features in Production

Adding an LLM to your application is easy. Making it reliable enough for production is another story. API timeouts, rate limits, hallucinations, and surprise $500 invoices await the unprepared. Here’s how to build LLM features that actually work. The Basics: Robust API Calls Never call an LLM API without proper error handling: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 import anthropic from tenacity import retry, stop_after_attempt, wait_exponential import time client = anthropic.Anthropic() @retry( stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=30), reraise=True ) def call_llm(prompt: str, max_tokens: int = 1024) -> str: try: response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=max_tokens, messages=[{"role": "user", "content": prompt}] ) return response.content[0].text except anthropic.RateLimitError: time.sleep(60) # Back off on rate limits raise except anthropic.APIStatusError as e: if e.status_code >= 500: raise # Retry on server errors raise # Don't retry on client errors (4xx) Timeouts Are Non-Negotiable LLM calls can hang. Always set timeouts: ...

March 4, 2026 Â· 6 min Â· 1111 words Â· Rob Washington

Production-Ready LLM API Integrations: Patterns That Actually Work

You’ve got your OpenAI or Anthropic API key. The hello-world example works. Now you need to put it in production and suddenly everything is different. LLM APIs have characteristics that break standard integration patterns: high latency, unpredictable response times, token-based billing, and outputs that can vary wildly for the same input. Here’s what actually works. The Unique Challenges Traditional API calls return in milliseconds. LLM calls can take 5-30 seconds. Traditional APIs have predictable costs per call. LLM costs depend on input and output length — and you don’t control the output. ...

March 3, 2026 Â· 5 min Â· 999 words Â· Rob Washington