LLM APIs are deceptively simple: send a prompt, get text back. But building reliable AI features requires handling rate limits, managing costs, structuring outputs, and gracefully degrading when things go wrong.
Here are the patterns that work in production.
The Basic Client Start with a wrapper that handles common concerns:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 import os import time from typing import Optional import anthropic from tenacity import retry, stop_after_attempt, wait_exponential class LLMClient: def __init__(self): self.client = anthropic.Anthropic() self.default_model = "claude-sonnet-4-20250514" self.max_tokens = 4096 @retry( stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=60) ) def complete( self, prompt: str, system: Optional[str] = None, model: Optional[str] = None, max_tokens: Optional[int] = None ) -> str: messages = [{"role": "user", "content": prompt}] response = self.client.messages.create( model=model or self.default_model, max_tokens=max_tokens or self.max_tokens, system=system or "", messages=messages ) return response.content[0].text The tenacity library handles retries with exponential backoff — essential for rate limits and transient errors.
...