Your pod got OOMKilled. Or throttled to 5% CPU. Or evicted because the node ran out of resources. The fix isn’t “add more resources” — it’s understanding how Kubernetes scheduling actually works.
Requests vs Limits
Requests: What you’re guaranteed. Kubernetes uses this for scheduling. Limits: The ceiling. Exceed this and bad things happen.
| |
What Happens When You Exceed Them
| Resource | Exceed Request | Exceed Limit |
|---|---|---|
| CPU | Throttled when node is busy | Hard throttled always |
| Memory | Fine if available | OOMKilled immediately |
CPU is compressible — you slow down but survive. Memory is not — you die.
The Common Mistakes
Mistake 1: No Limits Set
| |
Without limits, one pod can consume all node resources, starving others. The scheduler has no idea what you need.
Mistake 2: Requests = Limits
| |
This guarantees resources but prevents bursting. If your app only needs 1 CPU during startup and 100m at steady state, you’re paying for 900m of idle capacity.
Mistake 3: Limits Way Higher Than Requests
| |
The scheduler packs pods based on requests. If every pod requests 256Mi but uses 2Gi, your nodes will OOM when multiple pods burst simultaneously.
Right-Sizing Process
Step 1: Measure Actual Usage
Deploy with generous limits, then observe:
| |
With Prometheus:
| |
Step 2: Set Requests to Typical Usage
| |
Step 3: Set Limits for Burst Headroom
| |
Step 4: Handle Memory Differently Than CPU
Memory limits should be tight — OOMKill is better than node instability:
| |
CPU limits can be looser — throttling is survivable:
| |
QoS Classes
Kubernetes assigns Quality of Service based on your resource config:
Guaranteed (Highest Priority)
| |
- Last to be evicted
- Best for critical workloads
- Most expensive (no overcommit)
Burstable
| |
- Evicted after BestEffort
- Good balance for most workloads
BestEffort (Lowest Priority)
| |
- First to be evicted
- Only for batch jobs you don’t care about
LimitRanges and ResourceQuotas
Enforce Defaults with LimitRange
| |
Now pods without resource specs get sensible defaults, and you can’t request more than max.
Cap Namespace Usage with ResourceQuota
| |
Prevents one team from consuming the entire cluster.
Vertical Pod Autoscaler (VPA)
Let Kubernetes recommend or auto-adjust resources:
| |
Check recommendations:
| |
Caveat: VPA and HPA don’t play well together for the same metric. Use VPA for memory, HPA for CPU-based scaling.
Java and Memory: Special Handling
JVM apps need careful memory config. The JVM doesn’t automatically respect container limits.
| |
MaxRAMPercentage=75 leaves headroom for non-heap memory (metaspace, threads, native code).
Debugging Resource Issues
Pod Stuck in Pending
| |
Fix: Reduce requests or add nodes.
OOMKilled
| |
Fix: Increase memory limit (or fix the memory leak).
CPU Throttling
| |
Fix: Increase CPU limit or remove it (let burstable scheduling handle it).
The Cheat Sheet
| Workload Type | CPU Request | CPU Limit | Memory Request | Memory Limit |
|---|---|---|---|---|
| Web API | P50 usage | 2-4x request | P75 usage | P99 + 20% |
| Background Worker | Low | None or high | P75 usage | P95 + 10% |
| Batch Job | Low | None | Expected peak | Expected peak |
| Database | P75 usage | 1.5x request | Expected + buffer | Same as request |
Summary
- Always set requests — Scheduler needs them
- Set memory limits tight — OOMKill beats node crash
- Set CPU limits loose or not at all — Throttling is fine
- Measure before setting — Guessing wastes money
- Use VPA for recommendations — Data beats intuition
- Match QoS to criticality — Guaranteed for critical, Burstable for most
Resource limits are how you tell Kubernetes what your app actually needs. Get them right and your cluster hums along. Get them wrong and you’re firefighting OOMKills at 3 AM.
Measure, configure, iterate. The cluster will thank you.