Performance

Database Indexing Strategies: The Performance Lever You're Probably Misusing

Every junior developer learns that indexes make queries fast. What they don’t learn is that indexes also make writes slow, consume disk space, and can actively hurt performance when misused. Let’s fix that. What Indexes Actually Do An index is a separate data structure that maintains a sorted copy of specific columns, with pointers back to the full rows. Think of it like the index in a book — instead of reading every page to find “PostgreSQL,” you flip to the index, find the entry, and jump directly to page 247. ...

Async Python Patterns: Concurrency Without the Confusion

Async Python lets you handle thousands of concurrent I/O operations with a single thread. No threads, no processes, no GIL headaches. But it requires thinking differently about how code executes. These patterns help you write async code that’s both correct and efficient. The Basics 1 2 3 4 5 6 7 8 9 10 11 12 13 14 import asyncio async def fetch_data(url: str) -> dict: # This is a coroutine - it can be paused and resumed async with aiohttp.ClientSession() as session: async with session.get(url) as response: return await response.json() # Running coroutines async def main(): data = await fetch_data("https://api.example.com/data") print(data) asyncio.run(main()) await pauses the coroutine until the result is ready, letting other coroutines run. ...

PostgreSQL Performance Tuning: From Default to Optimized

PostgreSQL’s default configuration is designed to run on minimal hardware without crashing. It’s not designed to perform well. Out of the box, Postgres uses a fraction of available memory and I/O capacity. Proper tuning can improve performance by 10x or more. Memory Configuration shared_buffers The most important setting. This is Postgres’s own cache. 1 2 3 -- Default: 128MB (way too low) -- Recommended: 25% of system RAM shared_buffers = 4GB -- On a 16GB server Don’t set above 40% — the OS needs memory for its own file cache. ...

Linux Performance Debugging: Finding What's Slow and Why

Something’s slow. Users are complaining. Your monitoring shows high latency but not why. You SSH into the server and need to figure out what’s wrong — fast. This is a systematic approach to Linux performance debugging. Start with the Big Picture Before diving deep, get an overview: 1 2 3 4 5 6 # System load and uptime uptime # 17:00:00 up 45 days, load average: 8.52, 4.23, 2.15 # Load average: 1/5/15 minute averages # Compare to CPU count: load 8 on 4 CPUs = overloaded 1 2 # Quick health check top -bn1 | head -20 CPU: Who’s Using It? 1 2 3 4 5 6 # Real-time CPU usage top # Sort by CPU: press 'P' # Sort by memory: press 'M' # Show threads: press 'H' 1 2 3 4 5 # CPU usage by process (snapshot) ps aux --sort=-%cpu | head -10 # CPU time accumulated ps aux --sort=-time | head -10 1 2 3 4 # Per-CPU breakdown mpstat -P ALL 1 5 # Shows each CPU core's utilization # Look for: one core at 100% (single-threaded bottleneck) 1 2 3 4 5 6 7 8 9 10 11 # What's the CPU actually doing? vmstat 1 5 # r b swpd free buff cache si so bi bo in cs us sy id wa st # 3 0 0 245612 128940 2985432 0 0 8 24 312 892 45 12 40 3 0 # r: processes waiting for CPU (high = CPU-bound) # b: processes blocked on I/O (high = I/O-bound) # us: user CPU time # sy: system CPU time # wa: waiting for I/O # id: idle Memory: Running Out? 1 2 3 4 5 6 7 # Memory overview free -h # total used free shared buff/cache available # Mem: 31Gi 12Gi 2.1Gi 1.2Gi 17Gi 17Gi # "available" is what matters, not "free" # Linux uses free memory for cache — that's good 1 2 3 4 5 6 7 8 # Top memory consumers ps aux --sort=-%mem | head -10 # Memory details per process pmap -x <pid> # Or from /proc cat /proc/<pid>/status | grep -i mem 1 2 3 # Check for OOM killer activity dmesg | grep -i "out of memory" journalctl -k | grep -i oom 1 2 3 # Swap usage (if swapping, you're in trouble) swapon -s vmstat 1 5 # si/so columns show swap in/out Disk I/O: The Hidden Bottleneck 1 2 3 4 5 6 7 8 9 10 11 # I/O wait in top top # Look at %wa in the CPU line # Detailed I/O stats iostat -xz 1 5 # Device r/s w/s rkB/s wkB/s await %util # sda 150.00 200.00 4800 8000 12.5 85.0 # await: average I/O wait time (ms) - high = slow disk # %util: disk utilization - 100% = saturated 1 2 3 4 5 6 # Which processes are doing I/O? iotop -o # Shows only processes actively doing I/O # Or without iotop pidstat -d 1 5 1 2 3 # Check for disk errors dmesg | grep -i error smartctl -a /dev/sda # SMART data for disk health Network: Connections and Throughput 1 2 3 4 5 6 7 8 # Network interface stats ip -s link show # Watch for errors, drops, overruns # Real-time bandwidth iftop # Or nload 1 2 3 4 5 6 7 # Connection states ss -s # Total: 1523 (kernel 1847) # TCP: 892 (estab 654, closed 89, orphaned 12, timewait 127) # Many TIME_WAIT: connection churn # Many ESTABLISHED: legitimate load or connection leak 1 2 3 4 5 # Connections per state ss -tan | awk '{print $1}' | sort | uniq -c | sort -rn # Connections by remote IP ss -tn | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -rn | head 1 2 3 4 5 6 # Check for packet loss ping -c 100 <gateway> # Any loss = network problem # TCP retransmits netstat -s | grep -i retrans Process-Level Debugging 1 2 3 4 5 6 # What's a specific process doing? strace -p <pid> -c # Summary of system calls strace -p <pid> -f -e trace=network # Network-related calls only 1 2 3 4 5 6 7 8 # Open files and connections lsof -p <pid> # Just network connections lsof -i -p <pid> # Files in a directory lsof +D /var/log 1 2 3 4 5 # Process threads ps -T -p <pid> # Thread CPU usage top -H -p <pid> System-Wide Tracing 1 2 3 4 5 6 7 # What's happening system-wide? perf top # Real-time view of where CPU cycles go # Record and analyze perf record -g -a sleep 10 perf report 1 2 3 4 5 6 7 8 9 # BPF-based tools (if available) # CPU usage by function profile-bpfcc 10 # Disk latency histogram biolatency-bpfcc # TCP connection latency tcpconnlat-bpfcc The Checklist Approach When debugging, go through systematically: ...

Caching Strategies: The Two Hardest Problems in Computer Science

Phil Karlton’s famous quote about hard problems in computer science exists because caching is genuinely difficult. Not the mechanics — putting data in Redis is easy. The hard part is knowing when that data is wrong. Get caching right and your application feels instant. Get it wrong and users see stale data, inconsistent state, or worse — data that was never supposed to be visible to them. The Cache-Aside Pattern (Lazy Loading) The most common pattern: check cache first, fall back to database, populate cache on miss. ...

Database Connection Pooling: The Performance Win You're Probably Missing

Database connections are expensive. Each new connection requires TCP handshake, authentication, session initialization, and memory allocation on both client and server. Do this for every web request and you’ve got a performance problem hiding in plain sight. The Problem: Connection Overhead A typical PostgreSQL connection takes 50-100ms to establish. For a web request that runs a 5ms query, you’re spending 10-20x more time on connection setup than actual work. 1 2 3 4 5 6 7 8 # The naive approach - connection per request def get_user(user_id): conn = psycopg2.connect(DATABASE_URL) # 50-100ms cursor = conn.cursor() cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,)) # 5ms result = cursor.fetchone() conn.close() return result Under load, this creates connection storms. Each concurrent request opens its own connection. The database has connection limits. You hit those limits, new connections queue, latency spikes, everything falls apart. ...

Kubernetes Resource Limits: Right-Sizing Containers for Stability

Your pod got OOMKilled. Or throttled to 5% CPU. Or evicted because the node ran out of resources. The fix isn’t “add more resources” — it’s understanding how Kubernetes scheduling actually works. Requests vs Limits Requests: What you’re guaranteed. Kubernetes uses this for scheduling. Limits: The ceiling. Exceed this and bad things happen. 1 2 3 4 5 6 7 resources: requests: memory: "256Mi" cpu: "250m" # 0.25 cores limits: memory: "512Mi" cpu: "500m" # 0.5 cores What Happens When You Exceed Them Resource Exceed Request Exceed Limit CPU Throttled when node is busy Hard throttled always Memory Fine if available OOMKilled immediately CPU is compressible — you slow down but survive. Memory is not — you die. ...

Database Connection Pooling: Patterns for High-Throughput Applications

Every database connection costs something: TCP handshake, TLS negotiation, authentication, session state allocation. For PostgreSQL, that’s 1.3-2MB of memory per connection. For MySQL, 256KB-1MB. At scale, creating connections on demand kills both your database and your latency. Connection pooling solves this by reusing connections across requests. But misconfigured pools are worse than no pools — you get connection starvation, deadlocks, and debugging nightmares. Here’s how to do it right. The Problem: Connection Overhead Without pooling, a typical web request: ...

Serverless Cold Start Mitigation: Practical Patterns That Actually Work

Cold starts are serverless’s original sin. Your function spins up, downloads dependencies, initializes connections, and finally runs your code — all while your user waits. The P99 latency spikes. The SLA teeters. Here’s what actually works, ranked by effectiveness and cost. Understanding the Cold Start A cold start happens when there’s no warm instance available to handle a request. The platform must: Provision a container — 50-500ms depending on runtime size Initialize the runtime — 10-100ms (Python) to 500ms+ (JVM without optimization) Run your initialization code — depends on what you do at module level Execute the handler — your actual function 1 2 3 4 5 6 7 8 9 10 11 12 # Everything at module level runs during cold start import boto3 # ~100ms import pandas # ~500ms import torch # ~2000ms # Connection initialization during cold start dynamodb = boto3.resource('dynamodb') table = dynamodb.Table('users') def handler(event, context): # Only this runs on warm invocations return table.get_item(Key={'id': event['user_id']}) Measured cold start times for AWS Lambda (1024MB, us-east-1): ...

Caching Strategies: When, Where, and How to Cache

The fastest request is one you don’t make. Caching trades storage for speed, serving precomputed results instead of recalculating them. But caching done wrong is worse than no caching—stale data, inconsistencies, and debugging nightmares. When to Cache Cache when: Data is read more often than written Computing the result is expensive Slight staleness is acceptable The same data is requested repeatedly Don’t cache when: Data changes constantly Every request needs fresh data Storage cost exceeds compute savings Cache invalidation is harder than recomputation Cache Placement Client-Side Cache Browser cache, mobile app cache, CDN edge cache: ...