Your pod won’t start. The service isn’t routing. Something’s wrong but kubectl isn’t telling you what. Here’s how to actually debug Kubernetes problems.
The Debugging Hierarchy# Work from the outside in:
Cluster level — Is the cluster healthy?Node level — Are nodes ready?Pod level — Is the pod running?Container level — Is the container healthy?Application level — Is the app working?Most problems are at levels 3-5. Start there.
Pod Won’t Start# Check Pod Status# 1
2
kubectl get pods -o wide
kubectl describe pod <pod-name>
The Events section at the bottom of describe tells you what’s wrong.
Common States and Fixes# Pending — No nodes available:
E v e W n a t r s n : i n g F a i l e d S c h e d u l i n g p o d h a s u n b o u n d i m m e d i a t e P e r s i s t e n t V o l u m e C l a i m s
Fix: Check PVC status, ensure storage class exists.
1
2
kubectl get pvc
kubectl get storageclass
Pending — Insufficient resources:
E v e W n a t r s n : i n g F a i l e d S c h e d u l i n g I n s u f f i c i e n t c p u / m e m o r y
Fix: Reduce resource requests or add nodes.
1
2
3
4
resources :
requests :
memory : "64Mi" # Lower this
cpu : "100m" # Or this
ImagePullBackOff:
E v e W n a t r s n : i n g F a i l e d F a i l e d t o p u l l i m a g e " m y r e p o / m y i m a g e : l a t e s t "
Fix: Check image name, registry credentials.
1
2
3
4
5
# Verify image exists
docker pull myrepo/myimage:latest
# Check imagePullSecrets
kubectl get secret regcred -o yaml
CrashLoopBackOff:
The container starts and immediately crashes. Check logs:
1
2
kubectl logs <pod-name>
kubectl logs <pod-name> --previous # Logs from crashed container
Common causes:
Missing environment variables Bad config file Application error on startup Health check failing too fast Debug Containers# Run a debug container in the same pod:
1
2
3
4
5
# Kubernetes 1.25+
kubectl debug -it <pod-name> --image= busybox --target= <container-name>
# Or create a debug pod in the same namespace
kubectl run debug --rm -it --image= busybox -- sh
Service Not Routing# Verify the Chain# C l i e n t → S e r v i c e → E n d p o i n t s → P o d s
Check each link:
1
2
3
4
5
6
7
8
# 1. Service exists and has correct selector
kubectl get svc <service-name> -o yaml
# 2. Endpoints exist (pods matched by selector)
kubectl get endpoints <service-name>
# 3. Pods are running and ready
kubectl get pods -l app = <your-label>
No Endpoints?# Service selector doesn’t match pod labels:
1
2
3
4
5
6
7
8
9
# Service
spec :
selector :
app : myapp # Must match
# Pod
metadata :
labels :
app : myapp # This label
Endpoints Exist But No Traffic?# Check pod readiness:
1
2
kubectl get pods -o wide
# Look for READY column: 1/1 means ready, 0/1 means not ready
If pods aren’t ready, check readiness probe:
1
2
3
4
5
6
readinessProbe :
httpGet :
path : /health
port : 8080
initialDelaySeconds : 5
periodSeconds : 10
Test From Inside the Cluster# 1
2
3
4
5
kubectl run test --rm -it --image= busybox -- sh
# Inside the pod:
wget -qO- http://<service-name>.<namespace>.svc.cluster.local
nslookup <service-name>
Resource Issues# Check Resource Usage# 1
2
3
4
5
6
7
8
# Node resources
kubectl top nodes
# Pod resources
kubectl top pods
# Detailed pod resources
kubectl describe pod <pod-name> | grep -A5 "Limits\|Requests"
OOMKilled# Container exceeded memory limit:
S R t e a a t s e o : n : T O e O r M m K i i n l a l t e e d d
Fix: Increase memory limit or fix memory leak.
1
2
3
resources :
limits :
memory : "512Mi" # Increase this
CPU Throttling# Container hitting CPU limit. Check with:
1
kubectl top pod <pod-name>
If consistently at limit, increase:
1
2
3
resources :
limits :
cpu : "1000m" # 1 full core
Configuration Problems# Environment Variables# 1
kubectl exec <pod-name> -- env | grep MY_VAR
ConfigMaps and Secrets# 1
2
3
4
5
6
7
8
# Check ConfigMap exists and has expected data
kubectl get configmap <name> -o yaml
# Check Secret exists (values are base64 encoded)
kubectl get secret <name> -o yaml
# Decode a secret value
kubectl get secret <name> -o jsonpath = '{.data.password}' | base64 -d
Volume Mounts# 1
2
3
4
5
# Check what's mounted
kubectl exec <pod-name> -- ls -la /path/to/mount
# Check mount details
kubectl describe pod <pod-name> | grep -A10 "Mounts:"
Network Debugging# DNS Resolution# 1
kubectl run test --rm -it --image= busybox -- nslookup kubernetes.default
If DNS fails, check CoreDNS:
1
2
kubectl get pods -n kube-system -l k8s-app= kube-dns
kubectl logs -n kube-system -l k8s-app= kube-dns
Network Policies# Network policies might be blocking traffic:
1
2
kubectl get networkpolicies
kubectl describe networkpolicy <name>
Test connectivity:
1
2
3
4
5
kubectl run test --rm -it --image= nicolaka/netshoot -- bash
# Inside:
curl -v http://<service>:<port>
tcpdump -i any port 80
Ingress Issues# 1
2
3
4
5
6
# Check Ingress resource
kubectl get ingress
kubectl describe ingress <name>
# Check Ingress controller logs
kubectl logs -n ingress-nginx -l app.kubernetes.io/component= controller
Essential Commands# 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# Quick status
kubectl get pods,svc,deploy,rs
# Recent events (often reveals the problem)
kubectl get events --sort-by= '.lastTimestamp'
# Follow logs in real-time
kubectl logs -f <pod-name>
# Multiple containers
kubectl logs <pod-name> -c <container-name>
# All pods with a label
kubectl logs -l app = myapp --all-containers
# Execute into running container
kubectl exec -it <pod-name> -- /bin/sh
# Port forward for local testing
kubectl port-forward <pod-name> 8080:80
kubectl port-forward svc/<service-name> 8080:80
Debug Images# When your container doesn’t have debugging tools:
1
2
3
4
5
6
7
8
# netshoot - networking tools
kubectl run debug --rm -it --image= nicolaka/netshoot -- bash
# busybox - basic unix tools
kubectl run debug --rm -it --image= busybox -- sh
# alpine with curl
kubectl run debug --rm -it --image= curlimages/curl -- sh
1
2
3
4
5
6
7
8
# Get all pod IPs
kubectl get pods -o jsonpath = '{.items[*].status.podIP}'
# Get container image versions
kubectl get pods -o jsonpath = '{.items[*].spec.containers[*].image}'
# Get events for a specific pod
kubectl get events --field-selector involvedObject.name= <pod-name>
The Debugging Checklist# When something’s broken:
kubectl get pods — What state are pods in? kubectl describe pod <name> — What do events say? kubectl logs <name> — What does the app say? kubectl get events — What happened recently? kubectl exec — Can I get inside and poke around? kubectl get endpoints — Is the service finding pods? DNS test from inside cluster — Can pods resolve names?90% of problems are answered by steps 1-4. Start simple.
Kubernetes debugging is systematic elimination. Check the obvious first, use describe and logs religiously, and remember: the Events section is usually trying to tell you exactly what’s wrong.