Metrics

When your service goes down at 3 AM, you need answers fast. Observability—the ability to understand what’s happening inside your systems from their external outputs—is what separates a 5-minute fix from a 3-hour nightmare. The three pillars of observability are logs, metrics, and traces. Each tells a different part of the story. Logs: The Narrative Logs are discrete events. They tell you what happened in human-readable terms. 1 2 3 4 5 6 7 8 9 { "timestamp": "2026-03-03T12:34:56Z", "level": "error", "service": "payment-api", "message": "Payment processing failed", "user_id": "12345", "error_code": "CARD_DECLINED", "request_id": "abc-123" } Best Practices for Logging Structure your logs. JSON is your friend. Unstructured logs like Payment failed for user 12345 are hard to search and aggregate. ...

Monitoring tells you when something is wrong. Observability helps you understand why. In distributed systems, you can’t predict every failure mode—you need systems that let you ask arbitrary questions about their behavior. The Three Pillars Metrics: What’s Happening Now Numeric time-series data. Fast to query, cheap to store. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 from prometheus_client import Counter, Histogram, Gauge, start_http_server # Counter - only goes up requests_total = Counter( 'http_requests_total', 'Total HTTP requests', ['method', 'endpoint', 'status'] ) # Histogram - distribution of values request_duration = Histogram( 'http_request_duration_seconds', 'Request duration in seconds', ['method', 'endpoint'], buckets=[.01, .05, .1, .25, .5, 1, 2.5, 5, 10] ) # Gauge - can go up or down active_connections = Gauge( 'active_connections', 'Number of active connections' ) # Usage @app.route("/api/<endpoint>") def handle_request(endpoint): active_connections.inc() with request_duration.labels( method=request.method, endpoint=endpoint ).time(): result = process_request() requests_total.labels( method=request.method, endpoint=endpoint, status=200 ).inc() active_connections.dec() return result # Expose metrics endpoint start_http_server(9090) Logs: What Happened Discrete events with context. Rich detail, expensive at scale. ...

Metrics

The Three Pillars of Observability: Logs, Metrics, and Traces

Observability: Beyond Monitoring with Metrics, Logs, and Traces