Getting Structured Data from LLMs: JSON Mode and Beyond

The biggest challenge with LLMs in production isn’t getting good responses—it’s getting parseable responses. When you need JSON for your pipeline, “Here’s the data you requested:” followed by markdown-wrapped output breaks everything. Here’s how to reliably extract structured data. The Problem 1 2 3 4 5 6 7 8 response = client.chat.completions.create( model="gpt-4", messages=[{"role": "user", "content": "Extract the person's name and age from: 'John Smith is 34 years old'"}] ) print(response.choices[0].message.content) # "The person's name is John Smith and their age is 34." # ... not what we needed You wanted {"name": "John Smith", "age": 34}. You got prose. ...

February 26, 2026 Â· 6 min Â· 1074 words Â· Rob Washington

Container Orchestration Patterns for AI Workloads

Running AI workloads in containers presents unique challenges that traditional web application patterns don’t address. GPU scheduling, model caching, and bursty inference traffic all require thoughtful architecture. Here’s what actually works in production. The GPU Scheduling Problem Standard Kubernetes scheduling assumes CPU and memory are your primary constraints. When you add GPUs to the mix, everything changes. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 apiVersion: v1 kind: Pod metadata: name: inference-server spec: containers: - name: model image: my-registry/llm-server:v1.2 resources: limits: nvidia.com/gpu: 1 memory: "32Gi" requests: nvidia.com/gpu: 1 memory: "24Gi" The naive approach—one GPU per pod—works until you realize GPUs cost $2-4/hour and sit idle between requests. MIG (Multi-Instance GPU) and time-slicing help, but they introduce complexity: ...

February 26, 2026 Â· 4 min Â· 850 words Â· Rob Washington

Working with LLM APIs: A Practical Guide

How to integrate large language models into your applications — from basic API calls to production-ready patterns.

February 10, 2026 Â· 5 min Â· 949 words Â· Rob Washington

AI Coding Assistants: From Skeptic to True Believer

How AI coding assistants transformed my workflow — and why the skeptics are missing out.

February 10, 2026 Â· 3 min Â· 515 words Â· Rob Washington