Kubernetes needs to know how much CPU and memory your containers need. Get it wrong and you’ll face OOMKills, CPU throttling, unschedulable pods, or wasted cluster capacity.
Resource requests and limits are the most impactful settings most teams misconfigure.
Requests vs Limits
Requests: What you’re guaranteed. Used for scheduling. Limits: What you can’t exceed. Enforced at runtime.
| |
This pod:
- Is scheduled on a node with at least 256Mi memory and 250m CPU available
- Can use up to 512Mi memory before being OOMKilled
- Can use up to 500m CPU before being throttled
CPU: Throttling, Not Killing
CPU is compressible. Exceeding limits causes throttling, not termination:
| |
100m = 100 millicores = 10% of one CPU core.
When throttled, your application runs slower but keeps running. Signs of CPU throttling:
- Increased latency
- Timeouts
- Slow response times during load
Check throttling metrics:
| |
Memory: OOMKill
Memory is incompressible. Exceed limits and you’re killed:
| |
OOMKilled pods show:
| |
Set memory limits based on actual usage plus headroom for spikes.
Quality of Service Classes
Kubernetes assigns QoS based on your resource config:
Guaranteed (highest priority):
| |
Burstable (medium priority):
| |
BestEffort (lowest priority, first to be evicted):
| |
Under memory pressure, Kubernetes evicts BestEffort pods first, then Burstable, then Guaranteed.
Right-Sizing: Finding the Right Numbers
Don’t guess. Measure:
| |
Use Vertical Pod Autoscaler in recommend mode:
| |
| |
Common Mistakes
Over-requesting
| |
Over-requesting wastes cluster capacity. Pods are scheduled based on requests — if every pod requests 10x what it uses, you can only run 1/10th the workload.
Under-limiting Memory
| |
Without limits, one runaway pod can OOMKill other pods on the same node.
CPU Limits on Latency-Sensitive Apps
| |
Some teams remove CPU limits entirely for latency-sensitive workloads to avoid throttling. Keep requests for scheduling, remove limits to allow bursting:
| |
LimitRanges: Cluster Defaults
Set defaults so teams don’t forget:
| |
Pods without resource specs inherit these defaults.
ResourceQuotas: Namespace Limits
Prevent one team from consuming the whole cluster:
| |
Pods exceeding quota won’t be scheduled.
Horizontal vs Vertical Scaling
Horizontal Pod Autoscaler (more replicas):
| |
Vertical Pod Autoscaler (bigger pods):
| |
HPA is generally preferred — more replicas means better fault tolerance. VPA requires pod restarts to apply changes.
Java/JVM Considerations
JVM apps need special attention:
| |
MaxRAMPercentage tells the JVM to use 75% of container memory for heap, leaving room for off-heap, metaspace, and OS overhead.
Without this, JVM may try to use more memory than the container limit and get OOMKilled.
Monitoring Resource Usage
Essential Prometheus queries:
| |
Alert when pods consistently use >80% of limits — they’re candidates for increased limits or optimization.
Resource management is capacity planning at the container level. Request what you need (for scheduling), limit what you can tolerate (for protection), and monitor actual usage to refine both.
Start conservative, measure real usage, adjust. The goal is efficient packing without OOMKills or throttling — it takes iteration to get right.