Structured Logging: Logs That Actually Help You Debug

Your logs are probably useless. Not because you’re not logging, but because you’re logging wrong. Here’s how to make logs that actually help when things break.

The Problem with Unstructured Logs

Try answering these questions:

Which user logged in?
What went wrong? Where?
How do I find all errors for user 12345?

You can’t. At least not without regex gymnastics and hope.

Structured Logging

Same events, structured:

1
2
3
{"timestamp": "2024-03-12T10:23:45Z", "level": "info", "event": "user_login", "user_id": "u_789", "ip": "192.168.1.100"}
{"timestamp": "2024-03-12T10:23:46Z", "level": "error", "event": "payment_failed", "user_id": "u_789", "error": "card_declined", "amount": 99.99}
{"timestamp": "2024-03-12T10:23:47Z", "level": "info", "event": "order_processing", "user_id": "u_12345", "order_id": "ord_456"}

Now you can:

Query: user_id = "u_789" AND level = "error"
Aggregate: Count errors by event type
Alert: When payment_failed events spike
Correlate: Find all events for a user/request

Implementation

Python (structlog)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import structlog

# Configure once at startup
structlog.configure(
    processors=[
        structlog.processors.TimeStamper(fmt="iso"),
        structlog.processors.add_log_level,
        structlog.processors.JSONRenderer()
    ],
    wrapper_class=structlog.BoundLogger,
    context_class=dict,
    logger_factory=structlog.PrintLoggerFactory(),
)

log = structlog.get_logger()

# Usage
log.info("user_login", user_id="u_789", ip="192.168.1.100")
log.error("payment_failed", user_id="u_789", error="card_declined", amount=99.99)

# Bind context for the request
log = log.bind(request_id="req_123", user_id="u_789")
log.info("order_created", order_id="ord_456")  # Includes request_id and user_id

Node.js (pino)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
const pino = require('pino');

const log = pino({
  level: 'info',
  formatters: {
    level: (label) => ({ level: label }),
  },
  timestamp: pino.stdTimeFunctions.isoTime,
});

// Usage
log.info({ user_id: 'u_789', ip: '192.168.1.100' }, 'user_login');
log.error({ user_id: 'u_789', error: 'card_declined', amount: 99.99 }, 'payment_failed');

// Child logger with bound context
const reqLog = log.child({ request_id: 'req_123', user_id: 'u_789' });
reqLog.info({ order_id: 'ord_456' }, 'order_created');

Go (zerolog)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
package main

import (
    "os"
    "github.com/rs/zerolog"
    "github.com/rs/zerolog/log"
)

func main() {
    zerolog.TimeFieldFormat = zerolog.TimeFormatUnix
    log.Logger = zerolog.New(os.Stdout).With().Timestamp().Logger()

    // Usage
    log.Info().
        Str("user_id", "u_789").
        Str("ip", "192.168.1.100").
        Msg("user_login")

    log.Error().
        Str("user_id", "u_789").
        Str("error", "card_declined").
        Float64("amount", 99.99).
        Msg("payment_failed")

    // With context
    reqLog := log.With().
        Str("request_id", "req_123").
        Str("user_id", "u_789").
        Logger()
    
    reqLog.Info().Str("order_id", "ord_456").Msg("order_created")
}

Essential Fields

Every log entry should have:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
{
  "timestamp": "2024-03-12T10:23:45.123Z",
  "level": "info",
  "service": "payment-api",
  "version": "1.2.3",
  "environment": "production",
  "trace_id": "abc123",
  "span_id": "def456",
  "message": "payment_processed"
}

timestamp: ISO 8601, UTC
level: debug, info, warn, error, fatal
service: Which service generated this
version: Deployed version (helps with “did the deploy break it?”)
environment: prod, staging, dev
trace_id/span_id: For distributed tracing correlation

Request Context

Inject request context into every log within a request:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Python middleware example
import uuid
from contextvars import ContextVar

request_context: ContextVar[dict] = ContextVar('request_context', default={})

def logging_middleware(request, call_next):
    ctx = {
        'request_id': str(uuid.uuid4()),
        'user_id': request.user.id if request.user else None,
        'path': request.path,
        'method': request.method,
    }
    request_context.set(ctx)
    
    response = call_next(request)
    return response

# In your logger config, include context
def add_request_context(logger, method_name, event_dict):
    ctx = request_context.get()
    event_dict.update(ctx)
    return event_dict

Now every log in that request automatically includes request_id, user_id, etc.

Log Levels

Use them consistently:

Level	When to Use
debug	Detailed info for developers. Off in production.
info	Normal operations: requests, jobs, state changes
warn	Something unexpected but handled: retries, fallbacks
error	Something failed and needs attention
fatal	Application cannot continue

1
2
3
4
5
6
# Good level choices
log.debug("cache_lookup", key="user:123", hit=True)
log.info("order_created", order_id="ord_456", total=99.99)
log.warn("rate_limit_approaching", current=95, limit=100)
log.error("payment_failed", error="timeout", retry_count=3)
log.fatal("database_connection_failed", host="db.example.com")

What NOT to Log

Sensitive Data

1
2
3
4
5
# NEVER
log.info("user_login", password=user.password, ssn=user.ssn)

# DO
log.info("user_login", user_id=user.id)

Scrub: passwords, tokens, credit cards, SSNs, API keys, PII.

High-Volume Noise

1
2
3
# Don't log every cache hit in production
if settings.DEBUG:
    log.debug("cache_hit", key=key)

Use debug level for high-frequency events.

Aggregation and Search

ELK Stack (Elasticsearch, Logstash, Kibana)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Filebeat config
filebeat.inputs:
  - type: container
    paths:
      - '/var/lib/docker/containers/*/*.log'
    json.keys_under_root: true
    json.add_error_key: true

output.elasticsearch:
  hosts: ["elasticsearch:9200"]
  index: "logs-%{+yyyy.MM.dd}"

Loki (Grafana)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# Promtail config
scrape_configs:
  - job_name: containers
    static_configs:
      - targets:
          - localhost
        labels:
          job: docker
          __path__: /var/lib/docker/containers/*/*log
    pipeline_stages:
      - json:
          expressions:
            level: level
            service: service
      - labels:
          level:
          service:

CloudWatch Logs Insights

1
2
3
4
5
6
fields @timestamp, @message
| filter level = "error"
| filter service = "payment-api"
| stats count() by error
| sort count desc
| limit 10

Alerting on Logs

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Prometheus AlertManager (via Loki)
groups:
  - name: logs
    rules:
      - alert: HighErrorRate
        expr: |
          sum(rate({job="app"} |= "error" [5m])) by (service)
          / sum(rate({job="app"} [5m])) by (service)
          > 0.05
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High error rate in {{ $labels.service }}"

The Debugging Workflow

When something breaks:

Find the error: level = "error" AND timestamp > "10 minutes ago"
Get context: Find the request_id or trace_id
See the journey: request_id = "req_123" shows all logs for that request
Find patterns: error = "timeout" | stats count() by service
Correlate: Link to traces and metrics with the same IDs

Quick Wins

Today: Switch one service to JSON logging
This week: Add request_id to all logs
This month: Set up centralized log aggregation
This quarter: Build dashboards and alerts on log patterns

The goal: when something breaks at 3am, you can find out what happened in minutes, not hours.

Good logs tell a story. Great logs let you search that story.

The Problem with Unstructured Logs#

Structured Logging#

Implementation#

Python (structlog)#

Node.js (pino)#

Go (zerolog)#

Essential Fields#

Request Context#

Log Levels#

What NOT to Log#

Sensitive Data#

High-Volume Noise#

Aggregation and Search#

ELK Stack (Elasticsearch, Logstash, Kibana)#

Loki (Grafana)#

CloudWatch Logs Insights#

Alerting on Logs#

The Debugging Workflow#

Quick Wins#

📬 Get the Newsletter