Your pod won’t start. The service isn’t routing. Something’s wrong but kubectl isn’t telling you what. Here’s how to actually debug Kubernetes problems.

The Debugging Hierarchy

Work from the outside in:

  1. Cluster level — Is the cluster healthy?
  2. Node level — Are nodes ready?
  3. Pod level — Is the pod running?
  4. Container level — Is the container healthy?
  5. Application level — Is the app working?

Most problems are at levels 3-5. Start there.

Pod Won’t Start

Check Pod Status

1
2
kubectl get pods -o wide
kubectl describe pod <pod-name>

The Events section at the bottom of describe tells you what’s wrong.

Common States and Fixes

Pending — No nodes available:

EveWnatrsn:ingFailedSchedulingpodhasunboundimmediatePersistentVolumeClaims

Fix: Check PVC status, ensure storage class exists.

1
2
kubectl get pvc
kubectl get storageclass

Pending — Insufficient resources:

EveWnatrsn:ingFailedSchedulingInsufficientcpu/memory

Fix: Reduce resource requests or add nodes.

1
2
3
4
resources:
  requests:
    memory: "64Mi"   # Lower this
    cpu: "100m"      # Or this

ImagePullBackOff:

EveWnatrsn:ingFailedFailedtopullimage"myrepo/myimage:latest"

Fix: Check image name, registry credentials.

1
2
3
4
5
# Verify image exists
docker pull myrepo/myimage:latest

# Check imagePullSecrets
kubectl get secret regcred -o yaml

CrashLoopBackOff:

The container starts and immediately crashes. Check logs:

1
2
kubectl logs <pod-name>
kubectl logs <pod-name> --previous  # Logs from crashed container

Common causes:

  • Missing environment variables
  • Bad config file
  • Application error on startup
  • Health check failing too fast

Debug Containers

Run a debug container in the same pod:

1
2
3
4
5
# Kubernetes 1.25+
kubectl debug -it <pod-name> --image=busybox --target=<container-name>

# Or create a debug pod in the same namespace
kubectl run debug --rm -it --image=busybox -- sh

Service Not Routing

Verify the Chain

ClientServiceEndpointsPods

Check each link:

1
2
3
4
5
6
7
8
# 1. Service exists and has correct selector
kubectl get svc <service-name> -o yaml

# 2. Endpoints exist (pods matched by selector)
kubectl get endpoints <service-name>

# 3. Pods are running and ready
kubectl get pods -l app=<your-label>

No Endpoints?

Service selector doesn’t match pod labels:

1
2
3
4
5
6
7
8
9
# Service
spec:
  selector:
    app: myapp  # Must match

# Pod
metadata:
  labels:
    app: myapp  # This label

Endpoints Exist But No Traffic?

Check pod readiness:

1
2
kubectl get pods -o wide
# Look for READY column: 1/1 means ready, 0/1 means not ready

If pods aren’t ready, check readiness probe:

1
2
3
4
5
6
readinessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 10

Test From Inside the Cluster

1
2
3
4
5
kubectl run test --rm -it --image=busybox -- sh

# Inside the pod:
wget -qO- http://<service-name>.<namespace>.svc.cluster.local
nslookup <service-name>

Resource Issues

Check Resource Usage

1
2
3
4
5
6
7
8
# Node resources
kubectl top nodes

# Pod resources
kubectl top pods

# Detailed pod resources
kubectl describe pod <pod-name> | grep -A5 "Limits\|Requests"

OOMKilled

Container exceeded memory limit:

SRteaatseo:n:TOeOrMmKiinlalteedd

Fix: Increase memory limit or fix memory leak.

1
2
3
resources:
  limits:
    memory: "512Mi"  # Increase this

CPU Throttling

Container hitting CPU limit. Check with:

1
kubectl top pod <pod-name>

If consistently at limit, increase:

1
2
3
resources:
  limits:
    cpu: "1000m"  # 1 full core

Configuration Problems

Environment Variables

1
kubectl exec <pod-name> -- env | grep MY_VAR

ConfigMaps and Secrets

1
2
3
4
5
6
7
8
# Check ConfigMap exists and has expected data
kubectl get configmap <name> -o yaml

# Check Secret exists (values are base64 encoded)
kubectl get secret <name> -o yaml

# Decode a secret value
kubectl get secret <name> -o jsonpath='{.data.password}' | base64 -d

Volume Mounts

1
2
3
4
5
# Check what's mounted
kubectl exec <pod-name> -- ls -la /path/to/mount

# Check mount details
kubectl describe pod <pod-name> | grep -A10 "Mounts:"

Network Debugging

DNS Resolution

1
kubectl run test --rm -it --image=busybox -- nslookup kubernetes.default

If DNS fails, check CoreDNS:

1
2
kubectl get pods -n kube-system -l k8s-app=kube-dns
kubectl logs -n kube-system -l k8s-app=kube-dns

Network Policies

Network policies might be blocking traffic:

1
2
kubectl get networkpolicies
kubectl describe networkpolicy <name>

Test connectivity:

1
2
3
4
5
kubectl run test --rm -it --image=nicolaka/netshoot -- bash

# Inside:
curl -v http://<service>:<port>
tcpdump -i any port 80

Ingress Issues

1
2
3
4
5
6
# Check Ingress resource
kubectl get ingress
kubectl describe ingress <name>

# Check Ingress controller logs
kubectl logs -n ingress-nginx -l app.kubernetes.io/component=controller

The Debug Toolkit

Essential Commands

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
# Quick status
kubectl get pods,svc,deploy,rs

# Recent events (often reveals the problem)
kubectl get events --sort-by='.lastTimestamp'

# Follow logs in real-time
kubectl logs -f <pod-name>

# Multiple containers
kubectl logs <pod-name> -c <container-name>

# All pods with a label
kubectl logs -l app=myapp --all-containers

# Execute into running container
kubectl exec -it <pod-name> -- /bin/sh

# Port forward for local testing
kubectl port-forward <pod-name> 8080:80
kubectl port-forward svc/<service-name> 8080:80

Debug Images

When your container doesn’t have debugging tools:

1
2
3
4
5
6
7
8
# netshoot - networking tools
kubectl run debug --rm -it --image=nicolaka/netshoot -- bash

# busybox - basic unix tools  
kubectl run debug --rm -it --image=busybox -- sh

# alpine with curl
kubectl run debug --rm -it --image=curlimages/curl -- sh

JSONPath for Extraction

1
2
3
4
5
6
7
8
# Get all pod IPs
kubectl get pods -o jsonpath='{.items[*].status.podIP}'

# Get container image versions
kubectl get pods -o jsonpath='{.items[*].spec.containers[*].image}'

# Get events for a specific pod
kubectl get events --field-selector involvedObject.name=<pod-name>

The Debugging Checklist

When something’s broken:

  1. kubectl get pods — What state are pods in?
  2. kubectl describe pod <name> — What do events say?
  3. kubectl logs <name> — What does the app say?
  4. kubectl get events — What happened recently?
  5. kubectl exec — Can I get inside and poke around?
  6. kubectl get endpoints — Is the service finding pods?
  7. DNS test from inside cluster — Can pods resolve names?

90% of problems are answered by steps 1-4. Start simple.


Kubernetes debugging is systematic elimination. Check the obvious first, use describe and logs religiously, and remember: the Events section is usually trying to tell you exactly what’s wrong.