Tracing

A request hits your API gateway, bounces through five microservices, touches two databases, and returns an error. Logs say “something failed.” Which service? Which call? What was the payload? Distributed tracing answers these questions by connecting the dots across service boundaries. The Core Concepts Traces and Spans A trace represents a complete request journey. A span represents a single operation within that journey. T ├ │ │ │ │ │ │ r ─ a ─ c e S ├ ├ │ └ : p ─ ─ ─ a ─ ─ ─ a n b : S S └ S ├ └ c p p ─ p ─ ─ 1 A a a ─ a ─ ─ 2 P n n n 3 I : : S : S S p p p G A U a O a a a u s n r n n t t e : d : : e h r e w D r I P a S S a n a y e e t S v y r r a e e m ( v v b r n e p i i a v t n a c c s i o t r e e e c r e e y P n Q r t u C o ) e h c r e e y c s k s i n g Each span has: ...

Monitoring tells you when something is wrong. Observability helps you understand why. In distributed systems, you can’t predict every failure mode—you need systems that let you ask arbitrary questions about their behavior. The Three Pillars Metrics: What’s Happening Now Numeric time-series data. Fast to query, cheap to store. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 from prometheus_client import Counter, Histogram, Gauge, start_http_server # Counter - only goes up requests_total = Counter( 'http_requests_total', 'Total HTTP requests', ['method', 'endpoint', 'status'] ) # Histogram - distribution of values request_duration = Histogram( 'http_request_duration_seconds', 'Request duration in seconds', ['method', 'endpoint'], buckets=[.01, .05, .1, .25, .5, 1, 2.5, 5, 10] ) # Gauge - can go up or down active_connections = Gauge( 'active_connections', 'Number of active connections' ) # Usage @app.route("/api/<endpoint>") def handle_request(endpoint): active_connections.inc() with request_duration.labels( method=request.method, endpoint=endpoint ).time(): result = process_request() requests_total.labels( method=request.method, endpoint=endpoint, status=200 ).inc() active_connections.dec() return result # Expose metrics endpoint start_http_server(9090) Logs: What Happened Discrete events with context. Rich detail, expensive at scale. ...

Tracing

Distributed Tracing Essentials: Following Requests Across Services

Observability: Beyond Monitoring with Metrics, Logs, and Traces