Cold starts are serverless’s original sin. Your function spins up, downloads dependencies, initializes connections, and finally runs your code — all while your user waits. The P99 latency spikes. The SLA teeters.

Here’s what actually works, ranked by effectiveness and cost.

Understanding the Cold Start

A cold start happens when there’s no warm instance available to handle a request. The platform must:

  1. Provision a container — 50-500ms depending on runtime size
  2. Initialize the runtime — 10-100ms (Python) to 500ms+ (JVM without optimization)
  3. Run your initialization code — depends on what you do at module level
  4. Execute the handler — your actual function
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# Everything at module level runs during cold start
import boto3  # ~100ms
import pandas  # ~500ms
import torch  # ~2000ms

# Connection initialization during cold start
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('users')

def handler(event, context):
    # Only this runs on warm invocations
    return table.get_item(Key={'id': event['user_id']})

Measured cold start times for AWS Lambda (1024MB, us-east-1):

RuntimeMinimal FunctionTypical App
Python 3.12150ms400-800ms
Node.js 20120ms300-600ms
Go80ms100-200ms
Java (no optimization)800ms2000-4000ms
Java (SnapStart)100ms200-400ms
.NET 8200ms400-800ms

Pattern 1: Provisioned Concurrency (The Nuclear Option)

Keep instances warm permanently. Guaranteed no cold starts. Also guaranteed higher bills.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
resource "aws_lambda_function" "api" {
  function_name = "api-handler"
  runtime       = "python3.12"
  handler       = "main.handler"
  memory_size   = 1024
  
  # ... other config
}

resource "aws_lambda_provisioned_concurrency_config" "api" {
  function_name                     = aws_lambda_function.api.function_name
  provisioned_concurrent_executions = 10
  qualifier                         = aws_lambda_function.api.version
}

Cost math:

  • Provisioned concurrency: $0.000004463/GB-second
  • 10 instances × 1GB × 86400 seconds/day = $3.86/day
  • That’s $116/month just to keep 10 instances warm

When to use:

  • Latency SLAs under 200ms
  • Predictable traffic patterns
  • Cost is justified by business value

Optimization: Use auto-scaling for provisioned concurrency:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
resource "aws_appautoscaling_target" "lambda" {
  max_capacity       = 100
  min_capacity       = 5
  resource_id        = "function:${aws_lambda_function.api.function_name}:${aws_lambda_function.api.version}"
  scalable_dimension = "lambda:function:ProvisionedConcurrency"
  service_namespace  = "lambda"
}

resource "aws_appautoscaling_policy" "lambda" {
  name               = "lambda-provisioned-scaling"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.lambda.resource_id
  scalable_dimension = aws_appautoscaling_target.lambda.scalable_dimension
  service_namespace  = aws_appautoscaling_target.lambda.service_namespace

  target_tracking_scaling_policy_configuration {
    target_value = 0.7  # Keep 70% utilization
    
    predefined_metric_specification {
      predefined_metric_type = "LambdaProvisionedConcurrencyUtilization"
    }
    
    scale_in_cooldown  = 60
    scale_out_cooldown = 0
  }
}

Pattern 2: Scheduled Warming (The Budget Option)

Ping your function periodically to keep instances warm. Free, but unreliable.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# warmer.py
import boto3
import json

lambda_client = boto3.client('lambda')

FUNCTIONS_TO_WARM = [
    'api-handler',
    'user-service',
    'payment-processor'
]

def handler(event, context):
    for func in FUNCTIONS_TO_WARM:
        # Invoke with special warming payload
        lambda_client.invoke(
            FunctionName=func,
            InvocationType='Event',  # Async
            Payload=json.dumps({'warming': True})
        )
    
    return {'warmed': len(FUNCTIONS_TO_WARM)}

Your actual function checks for warming requests:

1
2
3
4
5
6
7
def handler(event, context):
    # Skip processing for warming pings
    if event.get('warming'):
        return {'statusCode': 200, 'body': 'warm'}
    
    # Normal processing
    return process_request(event)

Schedule with EventBridge:

1
2
3
4
5
6
7
8
9
resource "aws_cloudwatch_event_rule" "warmer" {
  name                = "lambda-warmer"
  schedule_expression = "rate(5 minutes)"
}

resource "aws_cloudwatch_event_target" "warmer" {
  rule = aws_cloudwatch_event_rule.warmer.name
  arn  = aws_lambda_function.warmer.arn
}

Problems:

  • Only keeps one instance warm per function
  • Traffic bursts still cause cold starts
  • Adds invocation costs (minimal but not zero)

Enhancement — warm multiple instances:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
import concurrent.futures

def warm_function(func_name, concurrency):
    """Warm multiple instances in parallel"""
    with concurrent.futures.ThreadPoolExecutor(max_workers=concurrency) as executor:
        futures = [
            executor.submit(
                lambda_client.invoke,
                FunctionName=func_name,
                InvocationType='RequestResponse',  # Sync to hold instance
                Payload=json.dumps({'warming': True, 'instance': i})
            )
            for i in range(concurrency)
        ]
        concurrent.futures.wait(futures)

Pattern 3: Minimize Package Size

Smaller deployment = faster cold start. The correlation is nearly linear.

Measure your package:

1
2
3
4
5
6
7
# Check deployed size
aws lambda get-function --function-name my-function \
  --query 'Configuration.CodeSize' --output text

# Check layer sizes
aws lambda get-layer-version --layer-name my-layer --version-number 1 \
  --query 'Content.CodeSize' --output text

Techniques:

  1. Use Lambda layers strategically — common dependencies in layers, function code stays small

  2. Prune unused dependencies:

1
2
3
4
5
6
# Python - use pip-autoremove
pip install pip-autoremove
pip-autoremove unused-package -y

# Node - use depcheck
npx depcheck
  1. Tree-shake aggressively:
1
2
3
4
5
// ❌ Imports entire SDK
const AWS = require('aws-sdk');

// ✅ Imports only what's needed
const { DynamoDBClient } = require('@aws-sdk/client-dynamodb');
  1. Use lightweight alternatives:
1
2
3
4
5
6
7
8
9
# ❌ pandas for simple CSV (adds 50MB+)
import pandas as pd
df = pd.read_csv('data.csv')

# ✅ stdlib csv module
import csv
with open('data.csv') as f:
    reader = csv.DictReader(f)
    data = list(reader)
  1. Binary dependencies — compile for Lambda:
1
2
3
4
5
# Build numpy/scipy for Lambda's Amazon Linux 2
FROM public.ecr.aws/lambda/python:3.12

RUN pip install numpy scipy -t /var/task/
# Then copy /var/task/* to your deployment package

Pattern 4: Lazy Initialization

Don’t load what you don’t need. Initialize on first use.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# Global but not initialized
_dynamodb_table = None
_s3_client = None

def get_table():
    global _dynamodb_table
    if _dynamodb_table is None:
        import boto3
        dynamodb = boto3.resource('dynamodb')
        _dynamodb_table = dynamodb.Table('users')
    return _dynamodb_table

def get_s3():
    global _s3_client
    if _s3_client is None:
        import boto3
        _s3_client = boto3.client('s3')
    return _s3_client

def handler(event, context):
    if event['action'] == 'get_user':
        # Only initializes DynamoDB
        return get_table().get_item(Key={'id': event['id']})
    elif event['action'] == 'get_file':
        # Only initializes S3
        return get_s3().get_object(Bucket='my-bucket', Key=event['key'])

Caveat: If most requests need all resources, lazy loading just moves latency from cold start to first request. Profile to verify it helps.

Pattern 5: Runtime Selection

Choose your runtime strategically:

Go or Rust — Fastest cold starts, smallest binaries. Best for latency-critical paths.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
package main

import (
    "context"
    "github.com/aws/aws-lambda-go/lambda"
)

func handler(ctx context.Context) (string, error) {
    return "Hello", nil
}

func main() {
    lambda.Start(handler)
}

Java with SnapStart — AWS caches initialized snapshots. Cuts cold starts by 90%.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
// Enable in SAM template
// SnapStart:
//   ApplyOn: PublishedVersions

public class Handler implements RequestHandler<APIGatewayProxyRequestEvent, APIGatewayProxyResponseEvent> {
    
    // Static initialization still happens at snapshot time
    private static final DynamoDbClient dynamoDb = DynamoDbClient.builder().build();
    
    @Override
    public APIGatewayProxyResponseEvent handleRequest(APIGatewayProxyRequestEvent event, Context context) {
        // Fast execution — state restored from snapshot
        return new APIGatewayProxyResponseEvent()
            .withStatusCode(200)
            .withBody("Hello");
    }
}

Python with Lambda Web Adapter — Run FastAPI/Flask directly, keep connections warm:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
FROM public.ecr.aws/lambda/python:3.12

COPY requirements.txt .
RUN pip install -r requirements.txt

COPY app.py .

# Lambda Web Adapter handles HTTP
COPY --from=public.ecr.aws/awsguru/aws-lambda-adapter:0.7.1 /lambda-adapter /opt/extensions/lambda-adapter

CMD ["python", "app.py"]

Pattern 6: Architecture Redesign

Sometimes the answer is to not use Lambda for that path.

Hybrid architecture:

E(--ClSa///taaFeppaniirc//gysca-ehtcaeerrcickthoiuctal)APIGatewayL(aemv//beaadrppayiit//hwrieenbpghooreotlkssse)

ECS for hot paths:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
resource "aws_ecs_service" "api" {
  name            = "api"
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.api.arn
  desired_count   = 2  # Always running
  
  # Auto-scale based on request count
  deployment_configuration {
    maximum_percent         = 200
    minimum_healthy_percent = 100
  }
}

When to move off Lambda:

  • Consistent traffic (no scale-to-zero benefit)
  • Strict latency SLAs (<100ms P99)
  • Long-running connections (WebSockets)
  • Heavy initialization (ML models, large frameworks)

Pattern 7: Monolithic Lambda

Counterintuitive: fewer functions can mean fewer cold starts.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# Instead of 10 functions for 10 endpoints,
# one function with routing

def handler(event, context):
    path = event['path']
    method = event['httpMethod']
    
    routes = {
        ('GET', '/users'): list_users,
        ('POST', '/users'): create_user,
        ('GET', '/users/{id}'): get_user,
        ('GET', '/orders'): list_orders,
        # ... more routes
    }
    
    handler_func = routes.get((method, path))
    if handler_func:
        return handler_func(event)
    return {'statusCode': 404}

Why this works:

  • One function gets more traffic = more warm instances
  • Shared initialization amortized across endpoints
  • Simpler deployment

Trade-off: Larger package size, longer individual cold start. Profile to see if the reduced cold start frequency outweighs the increased duration.

Measurement Is Everything

You can’t optimize what you don’t measure.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
import time
import os

# Track cold starts
IS_COLD_START = True

def handler(event, context):
    global IS_COLD_START
    
    start = time.time()
    
    if IS_COLD_START:
        # Log cold start with duration
        init_duration = float(os.environ.get('AWS_LAMBDA_INIT_DURATION', 0))
        print(f"COLD_START init_duration={init_duration}ms")
        IS_COLD_START = False
    
    # ... handle request
    
    duration = (time.time() - start) * 1000
    print(f"REQUEST duration={duration:.2f}ms")

CloudWatch Insights query:

1
2
3
4
5
6
fields @timestamp, @message
| filter @message like /COLD_START/
| stats count() as cold_starts, 
        avg(init_duration) as avg_init,
        pct(init_duration, 99) as p99_init
  by bin(1h)

The Pragmatic Approach

  1. Measure first — Know your current cold start frequency and duration
  2. Start with free — Package optimization, lazy loading, runtime selection
  3. Add scheduled warming — Gets you 80% of the benefit at minimal cost
  4. Use provisioned concurrency — Only for paths that justify the cost
  5. Consider hybrid — ECS for hot paths, Lambda for everything else

Cold starts aren’t a bug; they’re a trade-off. You get scale-to-zero and pay-per-use. The cost is occasional latency spikes. The patterns above help you minimize that cost without giving up the benefits.

The goal isn’t zero cold starts. It’s cold starts that don’t matter.