Structured Logging Done Right: From printf to Production

You’ve seen these logs:

Good luck debugging that at 3 AM. Which request? What went wrong? Retrying what?

Structured logging transforms logs from human-readable strings into machine-parseable data. Same information, infinitely more useful.

The Shift: Strings to Data

Unstructured:

1
logger.info(f"User {user_id} purchased {item_id} for ${amount}")

Structured:

1
2
3
4
5
6
logger.info("purchase_completed", extra={
    "user_id": user_id,
    "item_id": item_id,
    "amount": amount,
    "currency": "USD",
})

Output (JSON):

1
2
3
4
5
6
7
8
9
{
  "timestamp": "2026-03-10T07:00:00Z",
  "level": "info",
  "message": "purchase_completed",
  "user_id": "usr_123",
  "item_id": "item_456",
  "amount": 29.99,
  "currency": "USD"
}

Now you can:

Search: user_id:usr_123 AND level:error
Aggregate: “How many purchases over $100 today?”
Alert: “Notify when error rate exceeds 1%”
Correlate: “Show all logs for this user_id”

Python Setup

Using structlog for clean structured logging:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
import structlog
import logging

# Configure structlog
structlog.configure(
    processors=[
        structlog.contextvars.merge_contextvars,
        structlog.processors.add_log_level,
        structlog.processors.TimeStamper(fmt="iso"),
        structlog.processors.JSONRenderer()
    ],
    wrapper_class=structlog.make_filtering_bound_logger(logging.INFO),
    context_class=dict,
    logger_factory=structlog.PrintLoggerFactory(),
)

logger = structlog.get_logger()

# Usage
logger.info("request_started", path="/api/users", method="GET")
logger.error("database_error", error="connection refused", retry_count=3)

Request Context: The Game Changer

Every log should be traceable to its source. Add context once, include it everywhere:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
import structlog
from contextvars import ContextVar
import uuid

request_id_var: ContextVar[str] = ContextVar('request_id', default='')

def get_logger():
    return structlog.get_logger().bind(
        request_id=request_id_var.get(),
    )

# Middleware
async def request_context_middleware(request, call_next):
    request_id = request.headers.get('X-Request-ID', str(uuid.uuid4()))
    request_id_var.set(request_id)
    
    logger = get_logger()
    logger.info("request_started", 
        path=request.url.path,
        method=request.method,
        user_agent=request.headers.get('user-agent'),
    )
    
    try:
        response = await call_next(request)
        logger.info("request_completed", status=response.status_code)
        return response
    except Exception as e:
        logger.error("request_failed", error=str(e), error_type=type(e).__name__)
        raise

Now every log in the request automatically includes request_id. When something breaks, search by that ID to see the entire request flow.

Log Levels: Use Them Correctly

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# DEBUG: Detailed diagnostic info (disabled in production)
logger.debug("cache_lookup", key=cache_key, hit=True)

# INFO: Normal operations worth recording
logger.info("user_login", user_id=user_id, method="oauth")

# WARNING: Something unexpected but handled
logger.warning("rate_limit_approaching", user_id=user_id, remaining=5)

# ERROR: Something failed but the system continues
logger.error("payment_failed", user_id=user_id, error="card_declined")

# CRITICAL: System is in trouble, immediate attention needed
logger.critical("database_unavailable", host=db_host, retry_count=10)

The rule: if you’d page someone for it, it’s CRITICAL. If it needs investigation tomorrow, it’s ERROR. If it’s just interesting, it’s INFO.

What to Log

Always Log

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# Request boundaries
logger.info("request_started", path=path, method=method)
logger.info("request_completed", status=status, duration_ms=duration)

# Authentication events
logger.info("auth_success", user_id=user_id, method=method)
logger.warning("auth_failure", reason=reason, ip=ip)

# Business events
logger.info("order_placed", order_id=order_id, amount=amount)
logger.info("subscription_changed", user_id=user_id, plan=new_plan)

# Errors with context
logger.error("operation_failed", 
    operation="send_email",
    error=str(e),
    user_id=user_id,
    retry_count=retry_count,
)

Never Log

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Passwords, tokens, API keys
logger.info("login", password=password)  # NEVER

# Full credit card numbers
logger.info("payment", card_number=card)  # NEVER

# Personal data that could identify individuals
logger.info("user_data", ssn=ssn, medical_records=records)  # NEVER

# Large payloads that bloat storage
logger.debug("request_body", body=megabyte_payload)  # AVOID

Mask or hash sensitive data:

1
2
3
4
def mask_card(card: str) -> str:
    return f"****{card[-4:]}"

logger.info("payment_processed", card=mask_card(card_number))

Error Logging That Actually Helps

Bad:

1
2
except Exception as e:
    logger.error("Error occurred")

Good:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
except Exception as e:
    logger.error("payment_processing_failed",
        error=str(e),
        error_type=type(e).__name__,
        user_id=user_id,
        order_id=order_id,
        amount=amount,
        payment_method=payment_method,
        exc_info=True,  # Include stack trace
    )

Include everything needed to reproduce and debug. The goal: someone reading this log at 3 AM can understand what happened without looking at code.

Performance Logging

Measure what matters:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
import time
from contextlib import contextmanager

@contextmanager
def log_duration(operation: str, **context):
    start = time.perf_counter()
    try:
        yield
    finally:
        duration_ms = (time.perf_counter() - start) * 1000
        logger.info(f"{operation}_completed",
            duration_ms=round(duration_ms, 2),
            **context
        )

# Usage
with log_duration("database_query", table="users", query_type="select"):
    results = await db.fetch("SELECT * FROM users WHERE ...")

Output:

1
{"message": "database_query_completed", "duration_ms": 12.34, "table": "users", "query_type": "select"}

Correlation IDs: Tracing Across Services

When requests span multiple services, you need to trace them end-to-end:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# Service A: Generate correlation ID
correlation_id = str(uuid.uuid4())
logger.info("calling_service_b", correlation_id=correlation_id)
response = await http_client.post(
    "http://service-b/api/process",
    headers={"X-Correlation-ID": correlation_id},
    json=payload
)

# Service B: Extract and log correlation ID
@app.middleware("http")
async def correlation_middleware(request, call_next):
    correlation_id = request.headers.get("X-Correlation-ID", str(uuid.uuid4()))
    structlog.contextvars.bind_contextvars(correlation_id=correlation_id)
    return await call_next(request)

Search by correlation_id to see the entire distributed flow.

Log Aggregation

Structured logs shine when aggregated. Common stacks:

ELK (Elasticsearch, Logstash, Kibana):

1
2
3
4
5
6
7
8
9
# Filebeat config
filebeat.inputs:
  - type: log
    paths: ["/var/log/app/*.log"]
    json.keys_under_root: true
    json.add_error_key: true

output.elasticsearch:
  hosts: ["elasticsearch:9200"]

Loki + Grafana:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Promtail config
scrape_configs:
  - job_name: app
    static_configs:
      - targets: [localhost]
        labels:
          job: myapp
          __path__: /var/log/app/*.log
    pipeline_stages:
      - json:
          expressions:
            level: level
            message: message

CloudWatch (AWS):

1
2
3
4
5
6
7
import watchtower

handler = watchtower.CloudWatchLogHandler(
    log_group="myapp",
    stream_name="production",
)
logging.getLogger().addHandler(handler)

Alerting on Logs

Structured logs enable precise alerts:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Example: Alert on high error rate
- alert: HighErrorRate
  expr: |
    sum(rate(log_messages_total{level="error"}[5m])) 
    / sum(rate(log_messages_total[5m])) > 0.01
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Error rate above 1%"

Or search-based alerts in your log aggregator:

“Alert when level:error AND service:payments exceeds 10/minute”
“Alert when duration_ms > 5000 AND endpoint:/api/checkout”

The Checklist

Before shipping:

Logs are structured (JSON, not strings)
Request IDs included in all logs
Log levels used appropriately
Sensitive data masked or excluded
Errors include full context
Duration logged for slow operations
Correlation IDs for distributed tracing
Aggregation configured
Alerts defined for critical conditions

Structured logging is infrastructure that pays off every time something breaks. Invest the time upfront, and debugging becomes searching instead of guessing.

The Shift: Strings to Data#

Python Setup#

Request Context: The Game Changer#

Log Levels: Use Them Correctly#

What to Log#

Always Log#

Never Log#

Error Logging That Actually Helps#

Performance Logging#

Correlation IDs: Tracing Across Services#

Log Aggregation#

Alerting on Logs#

The Checklist#

📬 Get the Newsletter