Kubernetes Troubleshooting Patterns for Production

Kubernetes hides complexity until something breaks. Then you need to know where to look. Here’s a systematic approach to debugging production issues. The Debugging Hierarchy Start broad, narrow down: Cluster level: Nodes healthy? Resources available? Namespace level: Deployments running? Services configured? Pod level: Containers starting? Logs clean? Container level: Process running? Resources sufficient? Quick Health Check 1 2 3 4 5 6 7 8 9 10 11 # Node status kubectl get nodes -o wide # All pods across namespaces kubectl get pods -A # Pods not running kubectl get pods -A | grep -v Running | grep -v Completed # Events (recent issues) kubectl get events -A --sort-by='.lastTimestamp' | tail -20 Pod Troubleshooting Pod States State Meaning Check Pending Can’t be scheduled Resources, node selectors, taints ContainerCreating Image pulling or volume mounting Image name, pull secrets, PVCs CrashLoopBackOff Container crashing repeatedly Logs, resource limits, probes ImagePullBackOff Can’t pull image Image name, registry auth Error Container exited with error Logs Pending Pods 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # Why is it pending? kubectl describe pod my-pod # Look for: # - Insufficient cpu/memory # - No nodes match nodeSelector # - Taints not tolerated # - PVC not bound # Check node resources kubectl describe nodes | grep -A5 "Allocated resources" # Check PVC status kubectl get pvc CrashLoopBackOff 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # Get logs from current container kubectl logs my-pod # Get logs from previous (crashed) container kubectl logs my-pod --previous # Get logs from specific container kubectl logs my-pod -c my-container # Follow logs kubectl logs -f my-pod # Last N lines kubectl logs --tail=100 my-pod Common causes: ...

February 28, 2026 · 6 min · 1185 words · Rob Washington

Kubernetes Troubleshooting: A Practical Field Guide

When a Kubernetes deployment goes sideways at 3am, you need a systematic approach. Here’s the troubleshooting playbook I’ve developed from watching countless production incidents. The First Three Commands Before diving deep, these three commands tell you 80% of what you need: 1 2 3 4 5 6 7 8 # What's not running? kubectl get pods -A | grep -v Running | grep -v Completed # What happened recently? kubectl get events -A --sort-by='.lastTimestamp' | tail -20 # Resource pressure? kubectl top nodes Run these first. Always. ...

February 28, 2026 · 5 min · 995 words · Rob Washington