Optimization

Caching Strategies: Making Slow Things Fast

The fastest database query is the one you don’t make. Caching is how you turn expensive operations into cheap lookups. Here’s how to do it without shooting yourself in the foot. Cache Placement Where to Cache B r o w F S s a m e ↑ s a r t l e l C s e a t s c t h e → F L a a C s r D ↑ t g N e → A p p l M M i e e c d d a i i t ↑ u u i m m o n C a c h e → D a S L t l a a o r b w g a ↑ e e s s s e t t C a c h e → D a t a b a s e Cache as close to the user as possible, but only what makes sense at each layer. ...

Docker Multi-Stage Builds: Smaller Images, Faster Deploys

Your Docker images are probably too big. Most applications ship with compilers, package managers, and debug tools that never run in production. Multi-stage builds fix this by separating your build environment from your runtime environment. The Problem with Single-Stage Builds Here’s a typical Node.js Dockerfile: 1 2 3 4 5 6 7 8 FROM node:20 WORKDIR /app COPY package*.json ./ RUN npm ci COPY . . RUN npm run build EXPOSE 3000 CMD ["node", "dist/index.js"] This works, but the resulting image includes: ...

Caching Strategies: When, Where, and How to Cache

Caching is one of the most powerful performance optimizations available. It’s also one of the easiest to get wrong. The classic joke—“there are only two hard things in computer science: cache invalidation and naming things”—exists for a reason. Let’s cut through the complexity. When to Cache Not everything should be cached. Before adding a cache, ask: Is this data read more than written? Caching write-heavy data creates invalidation nightmares. Is computing this expensive? Database queries, API calls, complex calculations—good candidates. Can I tolerate stale data? If not, caching gets complicated fast. Is this a hot path? Cache what’s accessed frequently, not everything. 1 2 3 4 5 6 7 8 9 # Good cache candidate: expensive query, rarely changes @cache(ttl=3600) def get_product_catalog(): return db.query("SELECT * FROM products WHERE active = true") # Bad cache candidate: changes every request @cache(ttl=60) # Don't do this def get_user_cart(user_id): return db.query("SELECT * FROM carts WHERE user_id = ?", user_id) Where to Cache Caching happens at multiple layers. Each has tradeoffs. ...

Docker Multi-Stage Builds: Smaller Images, Faster Deploys

Your Docker images are probably too big. Build tools, dev dependencies, source files—all shipping to production when only the compiled binary matters. Multi-stage builds fix this by separating build environment from runtime environment. The Problem A typical single-stage Dockerfile: 1 2 3 4 5 6 7 8 9 FROM python:3.11 WORKDIR /app COPY requirements.txt . RUN pip install -r requirements.txt COPY . . CMD ["python", "app.py"] This image includes: ...

Docker Multi-Stage Builds: Smaller Images, Cleaner Deploys

Your Docker images are probably too big. Mine were. Then I learned about multi-stage builds. The problem is simple: build tools bloat production images. You need Node.js and npm to build your React app, but you only need nginx to serve it. You need Go and its toolchain to compile, but the binary runs standalone. Every megabyte of build tooling in your production image is wasted space, slower deploys, and expanded attack surface. ...

Semantic Caching for LLM Applications

Every LLM API call costs money and takes time. When users ask variations of the same question, you’re paying for the same computation repeatedly. Semantic caching solves this by recognizing that “What’s the weather in NYC?” and “How’s the weather in New York City?” are functionally identical. The Problem with Traditional Caching Standard key-value caching uses exact string matching: 1 2 3 cache_key = hash(prompt) if cache_key in cache: return cache[cache_key] This fails for LLM applications because: ...