You’ve seen these logs:

222000222666---000333---111000000777:::000000:::000011IEINRNFRFOOORPRrSeootcmreeystishniignn.gg..rweeqnuteswtrong

Good luck debugging that at 3 AM. Which request? What went wrong? Retrying what?

Structured logging transforms logs from human-readable strings into machine-parseable data. Same information, infinitely more useful.

The Shift: Strings to Data

Unstructured:

1
logger.info(f"User {user_id} purchased {item_id} for ${amount}")

Structured:

1
2
3
4
5
6
logger.info("purchase_completed", extra={
    "user_id": user_id,
    "item_id": item_id,
    "amount": amount,
    "currency": "USD",
})

Output (JSON):

1
2
3
4
5
6
7
8
9
{
  "timestamp": "2026-03-10T07:00:00Z",
  "level": "info",
  "message": "purchase_completed",
  "user_id": "usr_123",
  "item_id": "item_456",
  "amount": 29.99,
  "currency": "USD"
}

Now you can:

  • Search: user_id:usr_123 AND level:error
  • Aggregate: “How many purchases over $100 today?”
  • Alert: “Notify when error rate exceeds 1%”
  • Correlate: “Show all logs for this user_id”

Python Setup

Using structlog for clean structured logging:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
import structlog
import logging

# Configure structlog
structlog.configure(
    processors=[
        structlog.contextvars.merge_contextvars,
        structlog.processors.add_log_level,
        structlog.processors.TimeStamper(fmt="iso"),
        structlog.processors.JSONRenderer()
    ],
    wrapper_class=structlog.make_filtering_bound_logger(logging.INFO),
    context_class=dict,
    logger_factory=structlog.PrintLoggerFactory(),
)

logger = structlog.get_logger()

# Usage
logger.info("request_started", path="/api/users", method="GET")
logger.error("database_error", error="connection refused", retry_count=3)

Request Context: The Game Changer

Every log should be traceable to its source. Add context once, include it everywhere:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
import structlog
from contextvars import ContextVar
import uuid

request_id_var: ContextVar[str] = ContextVar('request_id', default='')

def get_logger():
    return structlog.get_logger().bind(
        request_id=request_id_var.get(),
    )

# Middleware
async def request_context_middleware(request, call_next):
    request_id = request.headers.get('X-Request-ID', str(uuid.uuid4()))
    request_id_var.set(request_id)
    
    logger = get_logger()
    logger.info("request_started", 
        path=request.url.path,
        method=request.method,
        user_agent=request.headers.get('user-agent'),
    )
    
    try:
        response = await call_next(request)
        logger.info("request_completed", status=response.status_code)
        return response
    except Exception as e:
        logger.error("request_failed", error=str(e), error_type=type(e).__name__)
        raise

Now every log in the request automatically includes request_id. When something breaks, search by that ID to see the entire request flow.

Log Levels: Use Them Correctly

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# DEBUG: Detailed diagnostic info (disabled in production)
logger.debug("cache_lookup", key=cache_key, hit=True)

# INFO: Normal operations worth recording
logger.info("user_login", user_id=user_id, method="oauth")

# WARNING: Something unexpected but handled
logger.warning("rate_limit_approaching", user_id=user_id, remaining=5)

# ERROR: Something failed but the system continues
logger.error("payment_failed", user_id=user_id, error="card_declined")

# CRITICAL: System is in trouble, immediate attention needed
logger.critical("database_unavailable", host=db_host, retry_count=10)

The rule: if you’d page someone for it, it’s CRITICAL. If it needs investigation tomorrow, it’s ERROR. If it’s just interesting, it’s INFO.

What to Log

Always Log

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# Request boundaries
logger.info("request_started", path=path, method=method)
logger.info("request_completed", status=status, duration_ms=duration)

# Authentication events
logger.info("auth_success", user_id=user_id, method=method)
logger.warning("auth_failure", reason=reason, ip=ip)

# Business events
logger.info("order_placed", order_id=order_id, amount=amount)
logger.info("subscription_changed", user_id=user_id, plan=new_plan)

# Errors with context
logger.error("operation_failed", 
    operation="send_email",
    error=str(e),
    user_id=user_id,
    retry_count=retry_count,
)

Never Log

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Passwords, tokens, API keys
logger.info("login", password=password)  # NEVER

# Full credit card numbers
logger.info("payment", card_number=card)  # NEVER

# Personal data that could identify individuals
logger.info("user_data", ssn=ssn, medical_records=records)  # NEVER

# Large payloads that bloat storage
logger.debug("request_body", body=megabyte_payload)  # AVOID

Mask or hash sensitive data:

1
2
3
4
def mask_card(card: str) -> str:
    return f"****{card[-4:]}"

logger.info("payment_processed", card=mask_card(card_number))

Error Logging That Actually Helps

Bad:

1
2
except Exception as e:
    logger.error("Error occurred")

Good:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
except Exception as e:
    logger.error("payment_processing_failed",
        error=str(e),
        error_type=type(e).__name__,
        user_id=user_id,
        order_id=order_id,
        amount=amount,
        payment_method=payment_method,
        exc_info=True,  # Include stack trace
    )

Include everything needed to reproduce and debug. The goal: someone reading this log at 3 AM can understand what happened without looking at code.

Performance Logging

Measure what matters:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
import time
from contextlib import contextmanager

@contextmanager
def log_duration(operation: str, **context):
    start = time.perf_counter()
    try:
        yield
    finally:
        duration_ms = (time.perf_counter() - start) * 1000
        logger.info(f"{operation}_completed",
            duration_ms=round(duration_ms, 2),
            **context
        )

# Usage
with log_duration("database_query", table="users", query_type="select"):
    results = await db.fetch("SELECT * FROM users WHERE ...")

Output:

1
{"message": "database_query_completed", "duration_ms": 12.34, "table": "users", "query_type": "select"}

Correlation IDs: Tracing Across Services

When requests span multiple services, you need to trace them end-to-end:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# Service A: Generate correlation ID
correlation_id = str(uuid.uuid4())
logger.info("calling_service_b", correlation_id=correlation_id)
response = await http_client.post(
    "http://service-b/api/process",
    headers={"X-Correlation-ID": correlation_id},
    json=payload
)

# Service B: Extract and log correlation ID
@app.middleware("http")
async def correlation_middleware(request, call_next):
    correlation_id = request.headers.get("X-Correlation-ID", str(uuid.uuid4()))
    structlog.contextvars.bind_contextvars(correlation_id=correlation_id)
    return await call_next(request)

Search by correlation_id to see the entire distributed flow.

Log Aggregation

Structured logs shine when aggregated. Common stacks:

ELK (Elasticsearch, Logstash, Kibana):

1
2
3
4
5
6
7
8
9
# Filebeat config
filebeat.inputs:
  - type: log
    paths: ["/var/log/app/*.log"]
    json.keys_under_root: true
    json.add_error_key: true

output.elasticsearch:
  hosts: ["elasticsearch:9200"]

Loki + Grafana:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Promtail config
scrape_configs:
  - job_name: app
    static_configs:
      - targets: [localhost]
        labels:
          job: myapp
          __path__: /var/log/app/*.log
    pipeline_stages:
      - json:
          expressions:
            level: level
            message: message

CloudWatch (AWS):

1
2
3
4
5
6
7
import watchtower

handler = watchtower.CloudWatchLogHandler(
    log_group="myapp",
    stream_name="production",
)
logging.getLogger().addHandler(handler)

Alerting on Logs

Structured logs enable precise alerts:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Example: Alert on high error rate
- alert: HighErrorRate
  expr: |
    sum(rate(log_messages_total{level="error"}[5m])) 
    / sum(rate(log_messages_total[5m])) > 0.01
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Error rate above 1%"

Or search-based alerts in your log aggregator:

  • “Alert when level:error AND service:payments exceeds 10/minute”
  • “Alert when duration_ms > 5000 AND endpoint:/api/checkout

The Checklist

Before shipping:

  • Logs are structured (JSON, not strings)
  • Request IDs included in all logs
  • Log levels used appropriately
  • Sensitive data masked or excluded
  • Errors include full context
  • Duration logged for slow operations
  • Correlation IDs for distributed tracing
  • Aggregation configured
  • Alerts defined for critical conditions

Structured logging is infrastructure that pays off every time something breaks. Invest the time upfront, and debugging becomes searching instead of guessing.