Structured Logging Done Right: From printf to Production

You’ve seen these logs: 2 2 2 0 0 0 2 2 2 6 6 6 - - - 0 0 0 3 3 3 - - - 1 1 1 0 0 0 0 0 0 7 7 7 : : : 0 0 0 0 0 0 : : : 0 0 0 0 1 1 I E I N R N F R F O O O R P R r S e o o t c m r e e y s t i s h n i i g n n . g g . . r w e e q n u t e s w t r o n g Good luck debugging that at 3 AM. Which request? What went wrong? Retrying what? ...

March 10, 2026 Â· 6 min Â· 1098 words Â· Rob Washington

Observability vs Monitoring: The Distinction That Actually Matters

Monitoring and observability get used interchangeably. They shouldn’t. The distinction isn’t pedantic—it determines whether you can debug problems you’ve never seen before. Monitoring answers: “Is the thing I expected to break, broken?” Observability answers: “What is happening, even if I didn’t anticipate it?” One is verification. The other is exploration. The Dashboard Trap Most teams start with dashboards. CPU usage, memory, request latency, error rates. Green means good, red means bad. ...

March 9, 2026 Â· 8 min Â· 1597 words Â· Rob Washington

Structured Logging That Actually Helps You Debug

Your logs are lying to you. Not because they’re wrong, but because they’re formatted for humans who will never read them. That stack trace you carefully formatted? It’ll be searched by a machine. Those helpful debug messages? They’ll be filtered by a regex that breaks on the first edge case. The log line that would have saved you three hours of debugging? Buried in 10GB of unstructured text. Structured logging fixes this. Here’s how to do it without making your codebase worse. ...

March 8, 2026 Â· 7 min Â· 1369 words Â· Rob Washington

Kubernetes Troubleshooting: A Practical Field Guide

Kubernetes failures are rarely mysterious once you know where to look. The problem is knowing where to look. This guide covers the systematic approach to diagnosing common Kubernetes issues. The Diagnostic Hierarchy Start broad, drill down: C l u s t e r → N o d e → P o d → C o n t a i n e r → A p p l i c a t i o n At each level, the same questions apply: ...

March 5, 2026 Â· 6 min Â· 1234 words Â· Rob Washington

Effective Logging: What to Log, How to Log It

Everyone logs. Few log well. The difference between “we have logs” and “we can debug with logs” comes down to discipline in what you capture, how you structure it, and where you send it. The Logging Hierarchy Not all log levels are created equal. Use them intentionally: F E W I D T A R A N E R T R R F B A A O N O U C L R G E → → → → → → T S U N D E h o n o e x e m e r t t e x m a r a t p a i e p h e l l m p i c e e l n t o d l i g e p y c d e d a f r i v t a b a a e i i u t g r o l t i n b n e o o o d h n s s c , a t e a n m i . n b d i c n u l l N o t e e i e t d s n v t . t f e c h o o r o e M n . n i e i t a g s E n i p h . x n p t p p u T e r e k b h n o . e e e s d e c i u W p o h v c a s m e e t k e a , i e r r o u a t u n s n b s . o n p e u m i r a a e n o t l o g b . l n . l y e e N m o u e . f p e f . d s i n a t p t r e o n d t . i o n . The key insight: INFO should tell a story. If you read only INFO logs, you should understand what the application did. ...

March 5, 2026 Â· 7 min Â· 1336 words Â· Rob Washington

Logging Levels: A Practical Guide to What Goes Where

Logging seems simple until you’re debugging production at 2 AM, scrolling through millions of lines trying to find the one that matters. Good logging practices make that experience less painful. Here’s how to think about log levels. The Levels Most logging frameworks use these standard levels: D E B U G < I N F O < W A R N < E R R O R < F A T A L In production, you typically run at INFO or WARN. Lower levels include all higher levels (INFO includes WARN, ERROR, and FATAL). ...

March 1, 2026 Â· 4 min Â· 836 words Â· Rob Washington

API Error Handling That Helps Instead of Frustrates

Bad error handling wastes everyone’s time. A cryptic “Error 500” sends developers on a debugging odyssey. A well-designed error response tells them exactly what went wrong and how to fix it. Here’s how to build the latter. The Anatomy of a Good Error Every error response should answer three questions: What happened? (error code/type) Why? (human-readable message) How do I fix it? (actionable guidance) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 { "error": { "code": "VALIDATION_ERROR", "message": "Request validation failed", "details": [ { "field": "email", "message": "Invalid email format", "received": "not-an-email" }, { "field": "age", "message": "Must be a positive integer", "received": "-5" } ], "documentation_url": "https://api.example.com/docs/errors#VALIDATION_ERROR" }, "request_id": "req_abc123" } Always include: ...

March 1, 2026 Â· 6 min Â· 1214 words Â· Rob Washington

Kubernetes Troubleshooting: A Practical Field Guide

When a Kubernetes deployment goes sideways at 3am, you need a systematic approach. Here’s the troubleshooting playbook I’ve developed from watching countless production incidents. The First Three Commands Before diving deep, these three commands tell you 80% of what you need: 1 2 3 4 5 6 7 8 # What's not running? kubectl get pods -A | grep -v Running | grep -v Completed # What happened recently? kubectl get events -A --sort-by='.lastTimestamp' | tail -20 # Resource pressure? kubectl top nodes Run these first. Always. ...

February 28, 2026 Â· 5 min Â· 995 words Â· Rob Washington

Debugging Production Issues Without Breaking Things

Production is sacred. When something breaks, you need to investigate without making it worse. Here’s how. Rule Zero: Don’t Make It Worse Before touching anything: Don’t restart services until you understand the problem Don’t deploy fixes without knowing the root cause Don’t clear logs you might need for investigation Don’t scale down what might be handling load Stabilize first, investigate second, fix third. Start With Observability Check Dashboards Before SSH-ing anywhere: ...

February 28, 2026 Â· 6 min Â· 1168 words Â· Rob Washington

YAML Gotchas: The Traps That Bite Every Developer

YAML looks simple until it isn’t. These gotchas have broken production configs and wasted countless debugging hours. Learn them once, avoid them forever. The Norway Problem 1 2 3 4 5 # These are ALL booleans in YAML 1.1 country: NO # false answer: yes # true enabled: on # true disabled: off # false Fix: Always quote strings that could be interpreted as booleans. 1 2 country: "NO" answer: "yes" YAML 1.2 fixed this, but many parsers (including PyYAML by default) still use 1.1 rules. ...

February 25, 2026 Â· 5 min Â· 1032 words Â· Rob Washington