API Rate Limiting: Protecting Your Service Without Annoying Your Users

Rate limiting is the immune system of your API. Without it, a single misbehaving client can take down your service for everyone. With poorly designed limits, you’ll frustrate legitimate users while sophisticated attackers route around you. The goal isn’t just protection—it’s fairness. Every user gets a reasonable share of your capacity. The Basic Algorithms Fixed Window The simplest approach: count requests per time window, reject when over limit. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 import time import redis def is_rate_limited(user_id: str, limit: int = 100, window: int = 60) -> bool: """Fixed window: 100 requests per minute.""" r = redis.Redis() # Window key based on current minute window_key = f"ratelimit:{user_id}:{int(time.time() // window)}" current = r.incr(window_key) if current == 1: r.expire(window_key, window) return current > limit Problem: Burst at window boundaries. A user can make 100 requests at 0:59 and 100 more at 1:00—200 requests in 2 seconds while technically staying under “100/minute.” ...

March 10, 2026 Â· 6 min Â· 1253 words Â· Rob Washington

Webhook Security: Beyond 'Just Verify the Signature'

Webhooks are deceptively simple: someone sends you HTTP requests, you process them. What could go wrong? Everything. Webhooks are inbound attack surface, and most implementations have gaps you could drive a truck through. The Obvious One: Signature Verification Most webhook providers sign their payloads. Stripe uses HMAC-SHA256. GitHub uses HMAC-SHA1 or SHA256. Slack uses its own signing scheme. You’ve probably implemented this: 1 2 3 4 5 6 7 8 9 10 11 import hmac import hashlib def verify_stripe_signature(payload: bytes, signature: str, secret: str) -> bool: expected = hmac.new( secret.encode(), payload, hashlib.sha256 ).hexdigest() return hmac.compare_digest(f"sha256={expected}", signature) Good. But this is table stakes. What else? ...

March 10, 2026 Â· 6 min Â· 1108 words Â· Rob Washington

Building Voice AI Assistants with VAPI: From Setup to Production

Voice AI has matured significantly. VAPI makes it straightforward to build voice assistants that can actually do things—not just chat, but call APIs, look up data, and take actions. Why VAPI? VAPI handles the hard parts of voice: Speech-to-text transcription LLM integration (OpenAI, Anthropic, custom) Text-to-speech with natural voices (ElevenLabs, etc.) Real-time streaming for low latency Tool/function calling during conversations You focus on what your assistant does. VAPI handles how it speaks and listens. ...

March 7, 2026 Â· 5 min Â· 1039 words Â· Rob Washington

LLM API Patterns for Production Systems

Building toy demos with LLM APIs is easy. Building production systems that handle real traffic, fail gracefully, and don’t bankrupt you? That’s where it gets interesting. The Reality of Production LLM Integration Most tutorials show you curl to an API and celebrate. Real systems need to handle: API rate limits and throttling Transient failures and retries Cost explosion from runaway loops Latency variance (100ms to 30s responses) Model version changes breaking prompts Token limits exceeding input size Let’s look at patterns that actually work. ...

March 6, 2026 Â· 5 min Â· 1000 words Â· Rob Washington

Integrating LLM APIs: Practical Patterns for Production

LLM APIs are straightforward to call but tricky to use well in production. Here’s what I’ve learned integrating them into real systems. Basic API Calls OpenAI 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 import openai client = openai.OpenAI(api_key="sk-...") response = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain kubernetes in one sentence."} ], max_tokens=100, temperature=0.7 ) print(response.choices[0].message.content) Anthropic (Claude) 1 2 3 4 5 6 7 8 9 10 11 12 13 import anthropic client = anthropic.Anthropic(api_key="sk-ant-...") response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, messages=[ {"role": "user", "content": "Explain kubernetes in one sentence."} ] ) print(response.content[0].text) curl (Any Provider) 1 2 3 4 5 6 7 curl https://api.openai.com/v1/chat/completions \ -H "Authorization: Bearer $OPENAI_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4o", "messages": [{"role": "user", "content": "Hello!"}] }' Streaming Responses For better UX, stream tokens as they arrive: ...

March 5, 2026 Â· 5 min Â· 1037 words Â· Rob Washington

curl Tips and Tricks for API Work

Every developer ends up using curl. It’s installed everywhere, works with any API, and once you learn the patterns, it’s faster than any GUI tool. Basic Requests 1 2 3 4 5 6 7 8 9 10 11 # GET (default) curl https://api.example.com/users # POST with data curl -X POST https://api.example.com/users \ -d '{"name":"Alice"}' # Other methods curl -X PUT ... curl -X DELETE ... curl -X PATCH ... Headers 1 2 3 4 5 6 7 8 9 10 # Set content type curl -H "Content-Type: application/json" ... # Auth header curl -H "Authorization: Bearer TOKEN" ... # Multiple headers curl -H "Content-Type: application/json" \ -H "Authorization: Bearer TOKEN" \ -H "X-Custom-Header: value" ... JSON Data 1 2 3 4 5 6 7 8 9 10 11 12 # Inline JSON curl -X POST https://api.example.com/users \ -H "Content-Type: application/json" \ -d '{"name":"Alice","email":"alice@example.com"}' # From file curl -X POST https://api.example.com/users \ -H "Content-Type: application/json" \ -d @data.json # Pretty output (pipe to jq) curl -s https://api.example.com/users | jq Response Info 1 2 3 4 5 6 7 8 9 10 11 # Show response headers curl -i https://api.example.com/users # Only headers (no body) curl -I https://api.example.com/users # HTTP status code only curl -s -o /dev/null -w "%{http_code}" https://api.example.com/users # Timing info curl -w "Time: %{time_total}s\n" -o /dev/null -s https://api.example.com Authentication 1 2 3 4 5 6 7 8 9 10 11 # Basic auth curl -u username:password https://api.example.com # Bearer token curl -H "Authorization: Bearer eyJ..." https://api.example.com # API key in header curl -H "X-API-Key: abc123" https://api.example.com # API key in query curl "https://api.example.com?api_key=abc123" Form Data 1 2 3 4 5 6 7 8 # URL-encoded form curl -X POST https://example.com/login \ -d "username=alice&password=secret" # Multipart form (file upload) curl -X POST https://example.com/upload \ -F "file=@photo.jpg" \ -F "description=My photo" Following Redirects 1 2 3 4 5 # Follow redirects (3xx) curl -L https://example.com/redirect # Show redirect chain curl -L -v https://example.com/redirect 2>&1 | grep "< HTTP\|< Location" SSL/TLS 1 2 3 4 5 6 7 8 # Ignore SSL errors (dev only!) curl -k https://self-signed.example.com # Use specific cert curl --cacert /path/to/ca.crt https://api.example.com # Client cert authentication curl --cert client.crt --key client.key https://api.example.com Timeouts 1 2 3 4 5 6 7 8 # Connection timeout (seconds) curl --connect-timeout 5 https://api.example.com # Max time for entire operation curl --max-time 30 https://api.example.com # Both curl --connect-timeout 5 --max-time 30 https://api.example.com Debugging 1 2 3 4 5 6 7 8 # Verbose output curl -v https://api.example.com # Even more verbose (includes SSL handshake) curl -vv https://api.example.com # Trace everything to file curl --trace trace.log https://api.example.com Saving Output 1 2 3 4 5 6 7 8 # Save to file curl -o response.json https://api.example.com/data # Save with remote filename curl -O https://example.com/file.zip # Silent mode (no progress) curl -s https://api.example.com Cookies 1 2 3 4 5 6 7 8 # Send cookie curl -b "session=abc123" https://example.com # Save cookies to file curl -c cookies.txt https://example.com/login -d "user=alice" # Send cookies from file curl -b cookies.txt https://example.com/dashboard Retry Logic 1 2 3 4 5 6 7 8 # Retry on failure curl --retry 3 https://api.example.com # Retry with delay curl --retry 3 --retry-delay 2 https://api.example.com # Retry on specific errors curl --retry 3 --retry-connrefused https://api.example.com Useful Aliases Add to your .bashrc: ...

March 5, 2026 Â· 4 min Â· 810 words Â· Rob Washington

Building Reliable LLM-Powered Features in Production

Adding an LLM to your application is easy. Making it reliable enough for production is another story. API timeouts, rate limits, hallucinations, and surprise $500 invoices await the unprepared. Here’s how to build LLM features that actually work. The Basics: Robust API Calls Never call an LLM API without proper error handling: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 import anthropic from tenacity import retry, stop_after_attempt, wait_exponential import time client = anthropic.Anthropic() @retry( stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=30), reraise=True ) def call_llm(prompt: str, max_tokens: int = 1024) -> str: try: response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=max_tokens, messages=[{"role": "user", "content": prompt}] ) return response.content[0].text except anthropic.RateLimitError: time.sleep(60) # Back off on rate limits raise except anthropic.APIStatusError as e: if e.status_code >= 500: raise # Retry on server errors raise # Don't retry on client errors (4xx) Timeouts Are Non-Negotiable LLM calls can hang. Always set timeouts: ...

March 4, 2026 Â· 6 min Â· 1111 words Â· Rob Washington

API Versioning: Strategies That Won't Break Your Clients

You shipped v1 of your API. Clients integrated. Now you need to make breaking changes. How do you evolve without breaking everyone? API versioning is the answer—but there’s no single “right” approach. Let’s examine the tradeoffs. What Counts as a Breaking Change? Before versioning, understand what actually breaks clients: Breaking changes: Removing a field from responses Removing an endpoint Changing a field’s type ("price": "19.99" → "price": 19.99) Renaming a field Changing required request parameters Changing authentication methods Non-breaking changes: ...

March 4, 2026 Â· 6 min Â· 1245 words Â· Rob Washington

Production-Ready LLM API Integrations: Patterns That Actually Work

You’ve got your OpenAI or Anthropic API key. The hello-world example works. Now you need to put it in production and suddenly everything is different. LLM APIs have characteristics that break standard integration patterns: high latency, unpredictable response times, token-based billing, and outputs that can vary wildly for the same input. Here’s what actually works. The Unique Challenges Traditional API calls return in milliseconds. LLM calls can take 5-30 seconds. Traditional APIs have predictable costs per call. LLM costs depend on input and output length — and you don’t control the output. ...

March 3, 2026 Â· 5 min Â· 999 words Â· Rob Washington

Webhook Security: Protecting Your Endpoints from the Wild West

Webhooks are HTTP endpoints that receive data from external services. Anyone who discovers your webhook URL can send requests to it. That’s a problem. Here’s how to secure them properly. The Threat Model Your webhook endpoint faces several threats: Spoofing: Attacker sends fake payloads pretending to be Stripe/GitHub/etc. Replay attacks: Attacker captures a legitimate request and resends it Tampering: Attacker intercepts and modifies payloads in transit Enumeration: Attacker discovers your webhook URLs through guessing Denial of service: Attacker floods your endpoint with requests Signature Verification Most webhook providers sign their payloads. Always verify: ...

March 1, 2026 Â· 5 min Â· 968 words Â· Rob Washington