Posts

Environment Variables Done Right

Environment variables are the standard way to configure applications across environments. They’re simple, universal, and supported everywhere. But like any tool, they can be misused. Here’s how to do them right. The Basics Environment variables are key-value pairs available to your process: 1 2 3 4 5 6 7 8 9 10 11 12 13 # Setting them export DATABASE_URL="postgres://localhost/myapp" export LOG_LEVEL="info" # Using them (Bash) echo $DATABASE_URL # Using them (Python) import os db_url = os.environ.get("DATABASE_URL") # Using them (Node.js) const dbUrl = process.env.DATABASE_URL; Naming Conventions Be consistent. A common pattern: ...

API Pagination Patterns: Offset, Cursor, and Keyset

Every API that returns lists needs pagination. Without it, a request for “all users” could return millions of rows, crushing your database and timing out the client. But pagination has tradeoffs—and choosing wrong can hurt performance or cause data inconsistencies. Offset Pagination The classic approach. Simple to implement, simple to understand: G G G E E E T T T / / / u u u s s s e e e r r r s s s ? ? ? l l l i i i m m m i i i t t t = = = 2 2 2 0 0 0 & & & o o o f f f f f f s s s e e e t t t = = = 0 2 4 0 0 # # # F S T i e h r c i s o r t n d d p p a p a g a g e g e e 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 @app.get("/users") def list_users(limit: int = 20, offset: int = 0): users = db.query( "SELECT * FROM users ORDER BY id LIMIT %s OFFSET %s", (limit, offset) ) total = db.query("SELECT COUNT(*) FROM users")[0][0] return { "data": users, "pagination": { "limit": limit, "offset": offset, "total": total } } Pros: ...

Health Check Endpoints: More Than Just 200 OK

Every modern service needs health check endpoints. Load balancers probe them. Kubernetes uses them. Monitoring systems scrape them. But a naive implementation—returning 200 OK if the process is running—tells you almost nothing useful. Here’s how to build health checks that actually help. Two Types of Health Liveness: Is the process alive and not deadlocked? Readiness: Can this instance handle requests right now? These are different questions with different answers: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 # Liveness: Am I alive? @app.get("/health/live") def liveness(): # If this returns, the process is alive return {"status": "alive"} # Readiness: Can I serve traffic? @app.get("/health/ready") def readiness(): checks = { "database": check_database(), "cache": check_cache(), "disk_space": check_disk_space(), } all_healthy = all(c["healthy"] for c in checks.values()) return JSONResponse( status_code=200 if all_healthy else 503, content={"status": "ready" if all_healthy else "not_ready", "checks": checks} ) Why separate them? ...

Logging Levels: A Practical Guide to What Goes Where

Logging seems simple until you’re debugging production at 2 AM, scrolling through millions of lines trying to find the one that matters. Good logging practices make that experience less painful. Here’s how to think about log levels. The Levels Most logging frameworks use these standard levels: D E B U G < I N F O < W A R N < E R R O R < F A T A L In production, you typically run at INFO or WARN. Lower levels include all higher levels (INFO includes WARN, ERROR, and FATAL). ...

Database Connection Pooling: Stop Opening Connections for Every Query

Opening a database connection is expensive. TCP handshake, SSL negotiation, authentication, session setup—it all adds up. Do that for every query and your application crawls. Connection pooling fixes this by reusing connections. Here’s how to do it right. The Problem Without pooling, every request opens a new connection: 1 2 3 4 5 6 7 8 # BAD: New connection per request def get_user(user_id): conn = psycopg2.connect(DATABASE_URL) # ~50-100ms cursor = conn.cursor() cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,)) user = cursor.fetchone() conn.close() return user At 100 requests per second, that’s 100 connections opening and closing per second. Your database server has a connection limit (typically 100-500). You’ll exhaust it fast. ...

API Error Handling That Helps Instead of Frustrates

Bad error handling wastes everyone’s time. A cryptic “Error 500” sends developers on a debugging odyssey. A well-designed error response tells them exactly what went wrong and how to fix it. Here’s how to build the latter. The Anatomy of a Good Error Every error response should answer three questions: What happened? (error code/type) Why? (human-readable message) How do I fix it? (actionable guidance) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 { "error": { "code": "VALIDATION_ERROR", "message": "Request validation failed", "details": [ { "field": "email", "message": "Invalid email format", "received": "not-an-email" }, { "field": "age", "message": "Must be a positive integer", "received": "-5" } ], "documentation_url": "https://api.example.com/docs/errors#VALIDATION_ERROR" }, "request_id": "req_abc123" } Always include: ...

Configuration Management Patterns for Reliable Deployments

Configuration is where deployments go to die. A typo in an environment variable, a missing secret, a config file that works in staging but breaks in production. Here’s how to make configuration boring and reliable. The Hierarchy of Configuration Not all config is created equal. Layer it: 1 2 3 4 5 . . . . . D C E C F e o n o e f n v m a a f i m t u i r a u l g o n r t n d e s f m - i e l f ( l n i l i e t n a n s e g v s c ( a f o p r l ( d e i a r e r a g u ) b s n e l t n e i v s m i e r ) o n m e n t ) S O D a R n y f E u e n e n n - a s v t o m t i i f i r m f c f o e a n c l m v h l e v e a b n e r n a t r r g c - r i e k s i d s s p d e e e s c s i f i c Later layers override earlier ones. This lets you: ...

Rate Limiting Strategies That Protect Without Frustrating

Rate limiting is the bouncer at your API’s door. Too strict, and legitimate users get frustrated. Too loose, and one bad actor can take down your service. Here’s how to find the balance. Why Rate Limit? Without limits, a single client can: Exhaust your database connections Burn through your third-party API quotas Inflate your cloud bill Deny service to everyone else Rate limiting isn’t about being restrictive—it’s about being fair. ...

Background Job Patterns That Actually Scale

Every production system eventually needs background jobs. Email notifications, report generation, data syncing, webhook processing—the work that can’t (or shouldn’t) happen during a user request. Here’s what I’ve learned about making them reliable. The Naive Approach (And Why It Breaks) Most developers start with something like this: 1 2 3 4 5 @app.route('/signup') def signup(): user = create_user(request.form) send_welcome_email(user) # Blocks the response return redirect('/dashboard') This works until it doesn’t. The email service has a 5-second timeout. Now your signup page feels broken. Or the email service is down, and signups fail entirely. ...

Retry Patterns: When and How to Try Again

Not all failures are permanent. Retry patterns help distinguish transient hiccups from real problems. Exponential Backoff 1 2 3 4 5 6 7 8 9 10 11 12 13 14 import time import random def retry_with_backoff(func, max_retries=5, base_delay=1): for attempt in range(max_retries): try: return func() except Exception as e: if attempt == max_retries - 1: raise delay = base_delay * (2 ** attempt) jitter = random.uniform(0, delay * 0.1) time.sleep(delay + jitter) Each retry waits longer: 1s, 2s, 4s, 8s, 16s. Jitter prevents thundering herd. With tenacity 1 2 3 4 5 6 7 8 from tenacity import retry, stop_after_attempt, wait_exponential @retry( stop=stop_after_attempt(5), wait=wait_exponential(multiplier=1, min=1, max=60) ) def call_api(): return requests.get("https://api.example.com") Retry Only Transient Errors 1 2 3 4 5 6 7 8 from tenacity import retry, retry_if_exception_type @retry( retry=retry_if_exception_type((ConnectionError, TimeoutError)), stop=stop_after_attempt(3) ) def fetch_data(): return external_service.get() Don’t retry 400 Bad Request — that won’t fix itself. ...