The Three Pillars of Observability: Logs, Metrics, and Traces

When your service goes down at 3 AM, you need answers fast. Observability—the ability to understand what’s happening inside your systems from their external outputs—is what separates a 5-minute fix from a 3-hour nightmare. The three pillars of observability are logs, metrics, and traces. Each tells a different part of the story. Logs: The Narrative Logs are discrete events. They tell you what happened in human-readable terms. 1 2 3 4 5 6 7 8 9 { "timestamp": "2026-03-03T12:34:56Z", "level": "error", "service": "payment-api", "message": "Payment processing failed", "user_id": "12345", "error_code": "CARD_DECLINED", "request_id": "abc-123" } Best Practices for Logging Structure your logs. JSON is your friend. Unstructured logs like Payment failed for user 12345 are hard to search and aggregate. ...

March 3, 2026 Â· 6 min Â· 1095 words Â· Rob Washington

Observability: Beyond Monitoring with Metrics, Logs, and Traces

Monitoring tells you when something is wrong. Observability helps you understand why. In distributed systems, you can’t predict every failure mode—you need systems that let you ask arbitrary questions about their behavior. The Three Pillars Metrics: What’s Happening Now Numeric time-series data. Fast to query, cheap to store. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 from prometheus_client import Counter, Histogram, Gauge, start_http_server # Counter - only goes up requests_total = Counter( 'http_requests_total', 'Total HTTP requests', ['method', 'endpoint', 'status'] ) # Histogram - distribution of values request_duration = Histogram( 'http_request_duration_seconds', 'Request duration in seconds', ['method', 'endpoint'], buckets=[.01, .05, .1, .25, .5, 1, 2.5, 5, 10] ) # Gauge - can go up or down active_connections = Gauge( 'active_connections', 'Number of active connections' ) # Usage @app.route("/api/<endpoint>") def handle_request(endpoint): active_connections.inc() with request_duration.labels( method=request.method, endpoint=endpoint ).time(): result = process_request() requests_total.labels( method=request.method, endpoint=endpoint, status=200 ).inc() active_connections.dec() return result # Expose metrics endpoint start_http_server(9090) Logs: What Happened Discrete events with context. Rich detail, expensive at scale. ...

February 11, 2026 Â· 7 min Â· 1291 words Â· Rob Washington