Your logs are lying to you. Not because they’re wrong, but because they’re formatted for humans who will never read them.
That stack trace you carefully formatted? It’ll be searched by a machine. Those helpful debug messages? They’ll be filtered by a regex that breaks on the first edge case. The log line that would have saved you three hours of debugging? Buried in 10GB of unstructured text.
Structured logging fixes this. Here’s how to do it without making your codebase worse.
What Structured Logging Actually Means
Structured logging means emitting logs as data, not strings. Instead of:
You emit:
| |
The difference isn’t cosmetic. The first format requires regex parsing that breaks when message formats change. The second format lets you query: “show me all connection timeouts for orders over $100 in the last hour.”
The Minimum Viable Log Entry
Every log entry should have:
| |
That’s the baseline. Everything else is context you add based on what you’re logging.
Timestamp: Always UTC, always ISO 8601. Your log aggregator will thank you.
Level: Be honest. INFO means “normal operation.” WARN means “something’s off but we’re handling it.” ERROR means “we failed to do what we were asked.”
Message: Short, static string. Not a template with variables. The variables go in separate fields.
Service: Which component. Essential when you have more than one thing running.
Trace ID: The key to distributed debugging. Pass this through every service in a request chain.
Context Is Everything
The log message tells you what happened. Context tells you why it matters.
| |
The second version lets you:
- Query all failures for a specific user
- Calculate failure rates by payment provider
- Find orders that failed after N retries
- Correlate with amount thresholds
The first version lets you grep and hope.
What to Log at Each Level
DEBUG: Internal state useful for development. Request/response bodies (sanitized), cache hits/misses, branch decisions. Never in production unless you’re actively debugging.
INFO: Normal operations worth recording. Request received, job started, job completed. The audit trail of your system doing its job.
WARN: Something unexpected that we handled. Rate limit approaching, deprecated endpoint called, fallback activated. Things that deserve attention but didn’t break anything.
ERROR: We failed to complete the requested operation. User impact occurred or was narrowly avoided. These should trigger alerts.
The Performance Question
“But structured logging is slower than string logging!”
Measure it. In most applications, the difference is microseconds. If you’re logging so much that JSON serialization is your bottleneck, you’re logging too much.
That said, some practical guidelines:
- Don’t log in hot loops. Aggregate, then log once.
- Lazy evaluation for expensive context. Don’t compute debug info if debug logging is disabled.
- Sample high-volume events. Log 1% of health checks, 100% of errors.
| |
Sensitive Data
Structured logging makes it easy to accidentally log secrets. Some rules:
- Never log passwords, tokens, or API keys. Not even partially.
- Hash or truncate identifiers. Log
user_id: abc...xyznot full emails. - Sanitize request bodies. Have an allowlist of fields that are safe to log.
- Review your log output. Grep your logs for patterns that look like secrets.
| |
Correlation Across Services
In a distributed system, a single user request touches multiple services. Without correlation, debugging is archaeology.
The pattern:
- Generate a trace ID at the edge (API gateway, load balancer)
- Pass it in headers through every internal call
- Include it in every log entry
- Include it in error responses to users
| |
Now when a user reports “something went wrong,” they can give you the trace ID from the error page, and you can find every log entry from every service that touched their request.
Log Aggregation
Structured logs are only useful if you can query them. The stack that works:
- Emit JSON to stdout. Let the infrastructure handle collection.
- Ship to a log aggregator. Loki, Elasticsearch, CloudWatch, Datadog — pick one.
- Build dashboards for common queries. Error rates by service, latency percentiles, top error messages.
- Set up alerts on patterns. Error rate spikes, new error types, specific high-severity events.
The key insight: your logs are a time-series database of events. Treat them that way.
Practical Migration
If you’re stuck with printf-style logging, migrate incrementally:
- Add a JSON handler alongside your existing one. Emit both formats temporarily.
- Start with error logs. These are most valuable and lowest volume.
- Add trace IDs to new code. Retrofit critical paths.
- Build one dashboard. Prove value before migrating everything.
- Kill the text logs. Once you’re querying JSON, you won’t go back.
The Debugging Payoff
Here’s what good structured logging enables:
Compare that to grepping through files, guessing at timestamps, and manually correlating between services.
The investment in structured logging pays off the first time you debug a production incident in minutes instead of hours. Your logs stop being write-only archives and become queryable operational data.
That’s the goal: logs as a feature, not a formality.