Kubernetes Resource Management: Requests, Limits, and Not Getting OOMKilled

Kubernetes needs to know how much CPU and memory your containers need. Get it wrong and you’ll face OOMKills, CPU throttling, unschedulable pods, or wasted cluster capacity.

Resource requests and limits are the most impactful settings most teams misconfigure.

Requests vs Limits

Requests: What you’re guaranteed. Used for scheduling. Limits: What you can’t exceed. Enforced at runtime.

1
2
3
4
5
6
7
resources:
  requests:
    memory: "256Mi"
    cpu: "250m"
  limits:
    memory: "512Mi"
    cpu: "500m"

This pod:

Is scheduled on a node with at least 256Mi memory and 250m CPU available
Can use up to 512Mi memory before being OOMKilled
Can use up to 500m CPU before being throttled

CPU: Throttling, Not Killing

CPU is compressible. Exceeding limits causes throttling, not termination:

1
2
3
4
5
resources:
  requests:
    cpu: "100m"    # 0.1 cores guaranteed
  limits:
    cpu: "200m"    # 0.2 cores max

100m = 100 millicores = 10% of one CPU core.

When throttled, your application runs slower but keeps running. Signs of CPU throttling:

Increased latency
Timeouts
Slow response times during load

Check throttling metrics:

1
2
kubectl top pods
# Or check container_cpu_cfs_throttled_seconds_total in Prometheus

Memory: OOMKill

Memory is incompressible. Exceed limits and you’re killed:

1
2
3
4
5
resources:
  requests:
    memory: "256Mi"
  limits:
    memory: "256Mi"  # Same as request = guaranteed QoS

OOMKilled pods show:

1
2
3
kubectl describe pod mypod
# State: Terminated
# Reason: OOMKilled

Set memory limits based on actual usage plus headroom for spikes.

Quality of Service Classes

Kubernetes assigns QoS based on your resource config:

Guaranteed (highest priority):

1
2
3
4
5
6
7
resources:
  requests:
    memory: "256Mi"
    cpu: "250m"
  limits:
    memory: "256Mi"    # Same as request
    cpu: "250m"        # Same as request

Burstable (medium priority):

1
2
3
4
5
resources:
  requests:
    memory: "128Mi"
  limits:
    memory: "256Mi"    # Different from request

BestEffort (lowest priority, first to be evicted):

1
# No resources specified at all

Under memory pressure, Kubernetes evicts BestEffort pods first, then Burstable, then Guaranteed.

Right-Sizing: Finding the Right Numbers

Don’t guess. Measure:

1
2
3
4
5
# Current usage
kubectl top pods

# Historical usage (requires metrics-server or Prometheus)
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/pods"

Use Vertical Pod Autoscaler in recommend mode:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-deployment
  updatePolicy:
    updateMode: "Off"  # Recommend only, don't auto-apply

1
2
kubectl describe vpa my-vpa
# Shows recommended requests based on actual usage

Common Mistakes

Over-requesting

1
2
3
4
# Wasteful: requesting 2GB but using 200MB
resources:
  requests:
    memory: "2Gi"

Over-requesting wastes cluster capacity. Pods are scheduled based on requests — if every pod requests 10x what it uses, you can only run 1/10th the workload.

Under-limiting Memory

1
2
3
4
5
# Dangerous: no limit, can consume entire node memory
resources:
  requests:
    memory: "256Mi"
  # No limit specified

Without limits, one runaway pod can OOMKill other pods on the same node.

CPU Limits on Latency-Sensitive Apps

1
2
3
4
# Potentially problematic for APIs
resources:
  limits:
    cpu: "100m"  # Will throttle under any load

Some teams remove CPU limits entirely for latency-sensitive workloads to avoid throttling. Keep requests for scheduling, remove limits to allow bursting:

1
2
3
4
5
6
resources:
  requests:
    cpu: "100m"
  # No CPU limit - can burst to available capacity
  limits:
    memory: "512Mi"  # Still limit memory

LimitRanges: Cluster Defaults

Set defaults so teams don’t forget:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: default
spec:
  limits:
  - default:
      memory: "512Mi"
      cpu: "500m"
    defaultRequest:
      memory: "256Mi"
      cpu: "100m"
    type: Container

Pods without resource specs inherit these defaults.

ResourceQuotas: Namespace Limits

Prevent one team from consuming the whole cluster:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-quota
  namespace: team-a
spec:
  hard:
    requests.cpu: "10"
    requests.memory: "20Gi"
    limits.cpu: "20"
    limits.memory: "40Gi"
    pods: "50"

Pods exceeding quota won’t be scheduled.

Horizontal vs Vertical Scaling

Horizontal Pod Autoscaler (more replicas):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Vertical Pod Autoscaler (bigger pods):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
spec:
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: "*"
      maxAllowed:
        memory: "2Gi"
        cpu: "2"

HPA is generally preferred — more replicas means better fault tolerance. VPA requires pod restarts to apply changes.

Java/JVM Considerations

JVM apps need special attention:

1
2
3
4
5
6
7
8
resources:
  requests:
    memory: "1Gi"
  limits:
    memory: "1Gi"
env:
  - name: JAVA_OPTS
    value: "-XX:MaxRAMPercentage=75.0"

MaxRAMPercentage tells the JVM to use 75% of container memory for heap, leaving room for off-heap, metaspace, and OS overhead.

Without this, JVM may try to use more memory than the container limit and get OOMKilled.

Monitoring Resource Usage

Essential Prometheus queries:

1
2
3
4
5
6
7
8
# Memory usage vs request
container_memory_working_set_bytes / on(pod) kube_pod_container_resource_requests{resource="memory"}

# CPU throttling
rate(container_cpu_cfs_throttled_seconds_total[5m])

# Pods near memory limit
(container_memory_working_set_bytes / container_spec_memory_limit_bytes) > 0.9

Alert when pods consistently use >80% of limits — they’re candidates for increased limits or optimization.

Resource management is capacity planning at the container level. Request what you need (for scheduling), limit what you can tolerate (for protection), and monitor actual usage to refine both.

Start conservative, measure real usage, adjust. The goal is efficient packing without OOMKills or throttling — it takes iteration to get right.

Requests vs Limits#

CPU: Throttling, Not Killing#

Memory: OOMKill#

Quality of Service Classes#

Right-Sizing: Finding the Right Numbers#

Common Mistakes#

Over-requesting#

Under-limiting Memory#

CPU Limits on Latency-Sensitive Apps#

LimitRanges: Cluster Defaults#

ResourceQuotas: Namespace Limits#

Horizontal vs Vertical Scaling#

Java/JVM Considerations#

Monitoring Resource Usage#

📬 Get the Newsletter