Kubernetes Resource Limits: Right-Sizing Containers for Stability

Your pod got OOMKilled. Or throttled to 5% CPU. Or evicted because the node ran out of resources. The fix isn’t “add more resources” — it’s understanding how Kubernetes scheduling actually works.

Requests vs Limits

Requests: What you’re guaranteed. Kubernetes uses this for scheduling. Limits: The ceiling. Exceed this and bad things happen.

1
2
3
4
5
6
7
resources:
  requests:
    memory: "256Mi"
    cpu: "250m"      # 0.25 cores
  limits:
    memory: "512Mi"
    cpu: "500m"      # 0.5 cores

What Happens When You Exceed Them

Resource	Exceed Request	Exceed Limit
CPU	Throttled when node is busy	Hard throttled always
Memory	Fine if available	OOMKilled immediately

CPU is compressible — you slow down but survive. Memory is not — you die.

The Common Mistakes

Mistake 1: No Limits Set

1
2
# DON'T DO THIS
resources: {}

Without limits, one pod can consume all node resources, starving others. The scheduler has no idea what you need.

Mistake 2: Requests = Limits

1
2
3
4
5
6
7
8
# OFTEN WASTEFUL
resources:
  requests:
    memory: "1Gi"
    cpu: "1000m"
  limits:
    memory: "1Gi"
    cpu: "1000m"

This guarantees resources but prevents bursting. If your app only needs 1 CPU during startup and 100m at steady state, you’re paying for 900m of idle capacity.

Mistake 3: Limits Way Higher Than Requests

1
2
3
4
5
6
7
8
# DANGEROUS
resources:
  requests:
    memory: "256Mi"
    cpu: "100m"
  limits:
    memory: "4Gi"
    cpu: "4000m"

The scheduler packs pods based on requests. If every pod requests 256Mi but uses 2Gi, your nodes will OOM when multiple pods burst simultaneously.

Right-Sizing Process

Step 1: Measure Actual Usage

Deploy with generous limits, then observe:

1
2
3
4
5
# Get current usage
kubectl top pods -n production

# Historical metrics (requires metrics-server or Prometheus)
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/namespaces/production/pods" | jq '.items[] | {name: .metadata.name, cpu: .containers[0].usage.cpu, memory: .containers[0].usage.memory}'

With Prometheus:

1
2
3
4
5
# P95 memory over 7 days
quantile_over_time(0.95, container_memory_usage_bytes{container="myapp"}[7d])

# P95 CPU over 7 days
quantile_over_time(0.95, rate(container_cpu_usage_seconds_total{container="myapp"}[5m])[7d:5m])

Step 2: Set Requests to Typical Usage

1
2
3
4
5
resources:
  requests:
    # Set to P50-P75 of actual usage
    memory: "384Mi"   # Typical usage: 350Mi
    cpu: "150m"       # Typical usage: 120m

Step 3: Set Limits for Burst Headroom

1
2
3
4
5
resources:
  limits:
    # Set to P99 + buffer
    memory: "512Mi"   # P99: 450Mi + 15% buffer
    cpu: "500m"       # P99: 400m + 25% buffer

Step 4: Handle Memory Differently Than CPU

Memory limits should be tight — OOMKill is better than node instability:

1
2
3
4
5
resources:
  requests:
    memory: "400Mi"
  limits:
    memory: "500Mi"   # Only 25% headroom

CPU limits can be looser — throttling is survivable:

1
2
3
4
5
resources:
  requests:
    cpu: "100m"
  limits:
    cpu: "1000m"      # 10x headroom for bursts

QoS Classes

Kubernetes assigns Quality of Service based on your resource config:

Guaranteed (Highest Priority)

1
2
3
4
5
6
7
8
# requests == limits for ALL containers
resources:
  requests:
    memory: "512Mi"
    cpu: "500m"
  limits:
    memory: "512Mi"
    cpu: "500m"

Last to be evicted
Best for critical workloads
Most expensive (no overcommit)

Burstable

1
2
3
4
5
6
7
8
# requests < limits OR only requests set
resources:
  requests:
    memory: "256Mi"
    cpu: "250m"
  limits:
    memory: "512Mi"
    cpu: "500m"

Evicted after BestEffort
Good balance for most workloads

BestEffort (Lowest Priority)

1
2
# No requests or limits
resources: {}

First to be evicted
Only for batch jobs you don’t care about

LimitRanges and ResourceQuotas

Enforce Defaults with LimitRange

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: production
spec:
  limits:
  - default:
      memory: "512Mi"
      cpu: "500m"
    defaultRequest:
      memory: "256Mi"
      cpu: "100m"
    max:
      memory: "2Gi"
      cpu: "2000m"
    min:
      memory: "64Mi"
      cpu: "50m"
    type: Container

Now pods without resource specs get sensible defaults, and you can’t request more than max.

Cap Namespace Usage with ResourceQuota

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-quota
  namespace: team-a
spec:
  hard:
    requests.cpu: "10"
    requests.memory: "20Gi"
    limits.cpu: "20"
    limits.memory: "40Gi"
    pods: "50"

Prevents one team from consuming the entire cluster.

Vertical Pod Autoscaler (VPA)

Let Kubernetes recommend or auto-adjust resources:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: myapp-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  updatePolicy:
    updateMode: "Auto"  # Or "Off" for recommendations only
  resourcePolicy:
    containerPolicies:
    - containerName: myapp
      minAllowed:
        memory: "128Mi"
        cpu: "50m"
      maxAllowed:
        memory: "2Gi"
        cpu: "2000m"

Check recommendations:

1
2
3
4
5
6
7
8
9
kubectl describe vpa myapp-vpa

# Output includes:
# Recommendation:
#   Container Recommendations:
#     Container Name: myapp
#     Lower Bound:    Cpu: 100m, Memory: 256Mi
#     Target:         Cpu: 250m, Memory: 512Mi
#     Upper Bound:    Cpu: 500m, Memory: 1Gi

Caveat: VPA and HPA don’t play well together for the same metric. Use VPA for memory, HPA for CPU-based scaling.

Java and Memory: Special Handling

JVM apps need careful memory config. The JVM doesn’t automatically respect container limits.

1
2
3
4
5
6
7
8
resources:
  requests:
    memory: "1Gi"
  limits:
    memory: "1Gi"
env:
- name: JAVA_OPTS
  value: "-XX:MaxRAMPercentage=75.0 -XX:+UseContainerSupport"

MaxRAMPercentage=75 leaves headroom for non-heap memory (metaspace, threads, native code).

Debugging Resource Issues

Pod Stuck in Pending

1
2
3
4
5
kubectl describe pod myapp-xxx

# Look for:
# Events:
#   Warning  FailedScheduling  No nodes are available that match all of the following predicates: Insufficient cpu, Insufficient memory

Fix: Reduce requests or add nodes.

OOMKilled

1
2
3
4
5
6
kubectl describe pod myapp-xxx

# Look for:
# Last State: Terminated
#   Reason: OOMKilled
#   Exit Code: 137

Fix: Increase memory limit (or fix the memory leak).

CPU Throttling

1
2
# Check throttling in Prometheus
container_cpu_cfs_throttled_seconds_total{container="myapp"}

Fix: Increase CPU limit or remove it (let burstable scheduling handle it).

The Cheat Sheet

Workload Type	CPU Request	CPU Limit	Memory Request	Memory Limit
Web API	P50 usage	2-4x request	P75 usage	P99 + 20%
Background Worker	Low	None or high	P75 usage	P95 + 10%
Batch Job	Low	None	Expected peak	Expected peak
Database	P75 usage	1.5x request	Expected + buffer	Same as request

Summary

Always set requests — Scheduler needs them
Set memory limits tight — OOMKill beats node crash
Set CPU limits loose or not at all — Throttling is fine
Measure before setting — Guessing wastes money
Use VPA for recommendations — Data beats intuition
Match QoS to criticality — Guaranteed for critical, Burstable for most

Resource limits are how you tell Kubernetes what your app actually needs. Get them right and your cluster hums along. Get them wrong and you’re firefighting OOMKills at 3 AM.

Measure, configure, iterate. The cluster will thank you.

Requests vs Limits#

What Happens When You Exceed Them#

The Common Mistakes#

Mistake 1: No Limits Set#

Mistake 2: Requests = Limits#

Mistake 3: Limits Way Higher Than Requests#

Right-Sizing Process#

Step 1: Measure Actual Usage#

Step 2: Set Requests to Typical Usage#

Step 3: Set Limits for Burst Headroom#

Step 4: Handle Memory Differently Than CPU#

QoS Classes#

Guaranteed (Highest Priority)#

Burstable#

BestEffort (Lowest Priority)#

LimitRanges and ResourceQuotas#

Enforce Defaults with LimitRange#

Cap Namespace Usage with ResourceQuota#

Vertical Pod Autoscaler (VPA)#

Java and Memory: Special Handling#

Debugging Resource Issues#

Pod Stuck in Pending#

OOMKilled#

CPU Throttling#

The Cheat Sheet#

Summary#

📬 Get the Newsletter