Observability: Beyond Monitoring with Metrics, Logs, and Traces

Monitoring tells you when something is wrong. Observability helps you understand why. In distributed systems, you can’t predict every failure mode—you need systems that let you ask arbitrary questions about their behavior. The Three Pillars Metrics: What’s Happening Now Numeric time-series data. Fast to query, cheap to store. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 from prometheus_client import Counter, Histogram, Gauge, start_http_server # Counter - only goes up requests_total = Counter( 'http_requests_total', 'Total HTTP requests', ['method', 'endpoint', 'status'] ) # Histogram - distribution of values request_duration = Histogram( 'http_request_duration_seconds', 'Request duration in seconds', ['method', 'endpoint'], buckets=[.01, .05, .1, .25, .5, 1, 2.5, 5, 10] ) # Gauge - can go up or down active_connections = Gauge( 'active_connections', 'Number of active connections' ) # Usage @app.route("/api/<endpoint>") def handle_request(endpoint): active_connections.inc() with request_duration.labels( method=request.method, endpoint=endpoint ).time(): result = process_request() requests_total.labels( method=request.method, endpoint=endpoint, status=200 ).inc() active_connections.dec() return result # Expose metrics endpoint start_http_server(9090) Logs: What Happened Discrete events with context. Rich detail, expensive at scale. ...

February 11, 2026 · 7 min · 1291 words · Rob Washington

Structured Logging: Stop Grepping, Start Querying

Unstructured logs are a liability. When your application writes User 12345 logged in from 192.168.1.1, you’re creating text that’s easy to read but impossible to query at scale. Structured logging changes the game: logs become data you can filter, aggregate, and analyze. The Problem with Text Logs 1 2 3 4 # Traditional logging import logging logging.info(f"User {user_id} logged in from {ip_address}") # Output: INFO:root:User 12345 logged in from 192.168.1.1 Want to find all logins from a specific IP? You need regex. Want to count logins per user? Good luck. Want to correlate with other events? Hope your timestamp parsing is solid. ...

February 11, 2026 · 6 min · 1083 words · Rob Washington

Logging That Actually Helps: From Printf to Production Debugging

A practical guide to logging — structured formats, log levels, correlation IDs, and patterns that make debugging production issues bearable.

February 10, 2026 · 5 min · 1008 words · Rob Washington

Monitoring vs Observability: What's the Difference and Why It Matters

Monitoring tells you when something is wrong. Observability helps you understand why. Here’s how to build systems you can actually debug.

February 10, 2026 · 5 min · 1058 words · Rob Washington