Posts

Semantic Caching for LLM Applications

Every LLM API call costs money and takes time. When users ask variations of the same question, you’re paying for the same computation repeatedly. Semantic caching solves this by recognizing that “What’s the weather in NYC?” and “How’s the weather in New York City?” are functionally identical. The Problem with Traditional Caching Standard key-value caching uses exact string matching: 1 2 3 cache_key = hash(prompt) if cache_key in cache: return cache[cache_key] This fails for LLM applications because: ...

Structured Logging for Distributed Systems

When your application spans multiple services, containers, and regions, print("something went wrong") doesn’t cut it anymore. Structured logging transforms your logs from walls of text into queryable data. Why Structured Logging? Traditional logs are strings meant for humans: [ 2 0 2 6 - 0 2 - 1 3 1 4 : 0 0 : 0 0 ] E R R O R : F a i l e d t o p r o c e s s o r d e r 1 2 3 4 5 f o r u s e r j o h n @ e x a m p l e . c o m Structured logs are data meant for machines (and humans): ...

Context Window Management for LLM Applications

One of the most underestimated challenges in building LLM-powered applications is context window management. You’ve got 128k tokens, but that doesn’t mean you should use them all on every request. The Problem Large context windows create a false sense of abundance. Yes, you can stuff 100k tokens into a request, but you’ll pay for it in: Latency: More tokens = slower responses Cost: You’re billed per token (input and output) Quality degradation: The “lost in the middle” phenomenon is real Practical Patterns 1. Rolling Window with Summary Keep a rolling window of recent conversation, but periodically summarize older context: ...

API Gateway Patterns: The Front Door to Your Microservices

Every request to your microservices should pass through a single front door. That door is your API gateway—and getting it right determines whether your architecture scales gracefully or collapses under complexity. Why API Gateways? Without a gateway, clients must: Know the location of every service Handle authentication with each service Implement retry logic, timeouts, and circuit breaking Deal with different protocols and response formats An API gateway centralizes these concerns: ...

Container Image Optimization: From 1.2GB to 45MB

That 1.2GB Python image you’re pushing to production? It contains gcc, make, and half of Debian’s package repository. Your application needs none of it at runtime. Container image optimization isn’t just about saving disk space—it’s about security (smaller attack surface), speed (faster pulls and deploys), and cost (less bandwidth and storage). Let’s fix it. The Problem: Development vs Runtime A typical Dockerfile grows organically: 1 2 3 4 5 6 7 8 9 # The bloated approach FROM python:3.11 WORKDIR /app COPY requirements.txt . RUN pip install -r requirements.txt COPY . . CMD ["python", "app.py"] This image includes: ...

Retry Patterns: Exponential Backoff and Beyond

Networks fail. Services go down. Databases get overwhelmed. The question isn’t whether your requests will fail—it’s how gracefully you handle it when they do. Naive retry logic can turn a minor hiccup into a catastrophic cascade. Smart retry logic can make your system resilient to transient failures. The difference is in the details. The Naive Approach (Don’t Do This) 1 2 3 4 5 6 7 8 9 # Bad: Immediate retry loop def fetch_data(url): for attempt in range(5): try: response = requests.get(url, timeout=5) return response.json() except requests.RequestException: continue raise Exception("Failed after 5 attempts") This code has several problems: ...

Graceful Shutdown: The Art of Dying Well in Production

Your container is about to die. It has 30 seconds to live. What happens next determines whether your users see a clean transition or a wall of 502 errors. Graceful shutdown is one of those things that seems obvious until you realize most applications do it wrong. The Problem When Kubernetes (or Docker, or systemd) decides to stop your application, it sends a SIGTERM signal. Your application has a grace period—usually 30 seconds—to finish what it’s doing and exit cleanly. After that, it gets SIGKILL. No negotiation. ...

Feature Flags: The Art of Shipping Code Without Shipping Features

There’s a subtle but powerful distinction in modern software delivery: deployment is not release. Deployment means your code is running in production. Release means your users can see it. Feature flags are the bridge between these two concepts—and mastering them changes how you think about shipping software. The Problem with Traditional Deployment In the old model, deploying code meant releasing features: 1 2 3 # Old way: deploy = release git push origin main # Boom, everyone sees the new feature immediately This creates pressure. You can’t deploy partially-complete work. You can’t test in production with real traffic. And if something breaks, your only option is another deploy to roll back. ...

Policy as Code: Enforcing Standards with OPA and Gatekeeper

Manual policy enforcement doesn’t scale. Security reviews become bottlenecks. Compliance audits are painful. Policy as code solves this—define policies once, enforce them everywhere, automatically. Open Policy Agent Basics OPA uses Rego, a declarative language for expressing policies. Simple Policy 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 # policy/authz.rego package authz default allow = false # Allow if user is admin allow { input.user.role == "admin" } # Allow if user owns the resource allow { input.user.id == input.resource.owner_id } # Allow read access to public resources allow { input.action == "read" input.resource.public == true } Test the Policy 1 2 3 4 5 6 7 8 9 10 # input.json { "user": {"id": "user-123", "role": "member"}, "resource": {"owner_id": "user-123", "public": false}, "action": "read" } # Run OPA opa eval -i input.json -d policy/ "data.authz.allow" # Result: true (user owns the resource) Policy Testing 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 # policy/authz_test.rego package authz test_admin_allowed { allow with input as { "user": {"role": "admin"}, "action": "delete", "resource": {"owner_id": "other"} } } test_owner_allowed { allow with input as { "user": {"id": "user-1", "role": "member"}, "action": "update", "resource": {"owner_id": "user-1"} } } test_non_owner_denied { not allow with input as { "user": {"id": "user-1", "role": "member"}, "action": "update", "resource": {"owner_id": "user-2", "public": false} } } 1 2 # Run tests opa test policy/ -v Kubernetes Gatekeeper Enforce policies on Kubernetes resources at admission time. ...

Documentation as Code: Keeping Docs in Sync with Your Systems

Documentation that lives outside your codebase gets stale. Documentation that isn’t tested breaks. Let’s treat docs like code—versioned, automated, and verified. Docs Live with Code Repository Structure p ├ ├ │ │ │ │ │ │ │ ├ └ r ─ ─ ─ ─ o ─ ─ ─ ─ j e s d ├ ├ ├ │ └ m R c r o ─ ─ ─ ─ k E t c c ─ ─ ─ ─ d A / / s o D / g a a └ r ├ └ c M e r p ─ u ─ ─ s E t c i ─ n ─ ─ . . t h / b y m i i o o d i m d n t p o e n l g e e k p c - c n s l i s t a / o d t u p y e a r i m n r e . e t t . y n - e m a t r d d m . e . l m s m d p d o n s e . m d MkDocs Configuration 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 # mkdocs.yml site_name: My Project site_url: https://docs.example.com repo_url: https://github.com/org/project theme: name: material features: - navigation.tabs - navigation.sections - search.highlight - content.code.copy plugins: - search - git-revision-date-localized - macros markdown_extensions: - admonition - codehilite - toc: permalink: true - pymdownx.superfences: custom_fences: - name: mermaid class: mermaid format: !!python/name:pymdownx.superfences.fence_code_format nav: - Home: index.md - Getting Started: getting-started.md - Architecture: architecture.md - API Reference: api/ - Runbooks: - Deployment: runbooks/deployment.md - Incidents: runbooks/incident-response.md Auto-Generated API Docs From OpenAPI Spec 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 # docs/api/openapi.yaml openapi: 3.0.3 info: title: My API version: 1.0.0 description: | API for managing resources. ## Authentication All endpoints require Bearer token authentication. paths: /users: get: summary: List users tags: [Users] parameters: - name: limit in: query schema: type: integer default: 20 responses: '200': description: List of users content: application/json: schema: type: array items: $ref: '#/components/schemas/User' components: schemas: User: type: object properties: id: type: string format: uuid email: type: string format: email created_at: type: string format: date-time Generate from Code 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 # generate_docs.py from fastapi import FastAPI from fastapi.openapi.utils import get_openapi import json import yaml app = FastAPI() # ... your routes ... def export_openapi(): """Export OpenAPI spec to file.""" openapi_schema = get_openapi( title=app.title, version=app.version, description=app.description, routes=app.routes, ) # Write JSON with open('docs/api/openapi.json', 'w') as f: json.dump(openapi_schema, f, indent=2) # Write YAML with open('docs/api/openapi.yaml', 'w') as f: yaml.dump(openapi_schema, f, default_flow_style=False) if __name__ == "__main__": export_openapi() Terraform Docs Generation 1 2 3 4 5 # Install terraform-docs brew install terraform-docs # Generate markdown from module terraform-docs markdown table ./modules/vpc > ./modules/vpc/README.md 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 # .terraform-docs.yml formatter: markdown table sections: show: - requirements - providers - inputs - outputs - resources output: file: README.md mode: inject template: |-  {{ .Content }}  Diagrams as Code Mermaid in Markdown 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 # Architecture ```mermaid graph TB subgraph "Public" LB[Load Balancer] end subgraph "Application Tier" API1[API Server 1] API2[API Server 2] end subgraph "Data Tier" DB[(PostgreSQL)] Cache[(Redis)] end LB --> API1 LB --> API2 API1 --> DB API2 --> DB API1 --> Cache API2 --> Cache # ` # f f f f w # ` r r r r i # ` d o o o o t p i m m m m h P y a d l w w d a a y t g d d d d D n b i i n p p t h r i i i i i s t t s i i h o a a a a a a = h h o n m g g g g g = a d c n s r r r r r A C p C b a / a a a a a R L l i l c D a m m m m m o B u u = h l d c i r s s s s ( u ( s = s e b b a a c . . . " t " t t R c g h i a a a P e L e [ e D = h r i m w w w r 5 o r E r S e a t p s s s o 3 a ( C ( ( E m e o . . . d ( d " S " " l a s c r c d n u " A ( D P a p t t o a e c D B p " a o s i u m t t t N a p A t s t r D p a w i S l l P a t i e i u b o o " a i I " g C . a t a r n ) n c ) r a p g e s k c a 1 : e c y r e A e t " S h a i i r r i ) Q e m m i m c " o , L ( , p m p h ) n " " o p o i " E ) R C r o r t ) C e l t r t e : S d u t c ( i s E A t " s t C R L u A " e S D B r P ) r S , e I , " R , 2 E o " l u s ) a t h , s e o t 5 w E i 3 = C C F S a a ( c l " h s A e e P , I f 3 i " l ) e ] n a m e = " d o c s / i m a g e s / a r c h i t e c t u r e " ) : CI/CD for Documentation GitHub Actions 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 # .github/workflows/docs.yml name: Documentation on: push: branches: [main] paths: - 'docs/**' - 'mkdocs.yml' - 'src/**/*.py' # Regenerate if code changes pull_request: paths: - 'docs/**' jobs: build: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 with: fetch-depth: 0 # For git-revision-date plugin - uses: actions/setup-python@v5 with: python-version: '3.11' - name: Install dependencies run: | pip install mkdocs-material mkdocs-macros-plugin pip install -e . # Install project for API doc generation - name: Generate API docs run: python generate_docs.py - name: Generate Terraform docs run: | terraform-docs markdown table ./terraform > ./docs/terraform.md - name: Build documentation run: mkdocs build --strict - name: Upload artifact uses: actions/upload-pages-artifact@v3 with: path: site/ deploy: if: github.ref == 'refs/heads/main' needs: build runs-on: ubuntu-latest permissions: pages: write id-token: write environment: name: github-pages steps: - uses: actions/deploy-pages@v4 Documentation Testing Link Checking 1 2 3 4 5 6 # In CI - name: Check links uses: lycheeverse/lychee-action@v1 with: args: --verbose --no-progress './docs/**/*.md' fail: true Code Example Testing 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 # test_docs.py import subprocess import re from pathlib import Path def extract_code_blocks(markdown_file: Path) -> list: """Extract fenced code blocks from markdown.""" content = markdown_file.read_text() pattern = r'```(\w+)\n(.*?)```' return re.findall(pattern, content, re.DOTALL) def test_python_examples(): """Test that Python code examples are valid.""" for doc in Path('docs').rglob('*.md'): for lang, code in extract_code_blocks(doc): if lang == 'python': # Check syntax try: compile(code, doc.name, 'exec') except SyntaxError as e: raise AssertionError(f"Syntax error in {doc}: {e}") def test_bash_examples(): """Test that bash commands are valid syntax.""" for doc in Path('docs').rglob('*.md'): for lang, code in extract_code_blocks(doc): if lang in ('bash', 'shell', 'sh'): result = subprocess.run( ['bash', '-n', '-c', code], capture_output=True ) if result.returncode != 0: raise AssertionError( f"Bash syntax error in {doc}: {result.stderr.decode()}" ) OpenAPI Validation 1 2 3 4 # In CI - name: Validate OpenAPI spec run: | npx @apidevtools/swagger-cli validate docs/api/openapi.yaml README Generation From Template 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 # generate_readme.py from jinja2 import Template import subprocess def get_version(): return subprocess.check_output( ['git', 'describe', '--tags', '--always'] ).decode().strip() def get_contributors(): output = subprocess.check_output( ['git', 'shortlog', '-sn', 'HEAD'] ).decode() return [line.split('\t')[1] for line in output.strip().split('\n')] template = Template(''' # {{ name }} {{ description }} ## Installation ```bash pip install {{ package_name }} Quick Start 1 2 3 4 from {{ package_name }} import Client client = Client(api_key="your-key") result = client.do_something() Documentation Full documentation at [{{ docs_url }}]({{ docs_url }}) ...