Serverless Cold Start Mitigation: Practical Patterns That Actually Work
Concrete strategies for reducing serverless cold starts, from provisioned concurrency to architecture redesign.
February 19, 2026 · 9 min · 1876 words · Rob Washington
Table of Contents
Cold starts are serverless’s original sin. Your function spins up, downloads dependencies, initializes connections, and finally runs your code — all while your user waits. The P99 latency spikes. The SLA teeters.
Here’s what actually works, ranked by effectiveness and cost.
A cold start happens when there’s no warm instance available to handle a request. The platform must:
Provision a container — 50-500ms depending on runtime size
Initialize the runtime — 10-100ms (Python) to 500ms+ (JVM without optimization)
Run your initialization code — depends on what you do at module level
Execute the handler — your actual function
1
2
3
4
5
6
7
8
9
10
11
12
# Everything at module level runs during cold startimportboto3# ~100msimportpandas# ~500msimporttorch# ~2000ms# Connection initialization during cold startdynamodb=boto3.resource('dynamodb')table=dynamodb.Table('users')defhandler(event,context):# Only this runs on warm invocationsreturntable.get_item(Key={'id':event['user_id']})
Measured cold start times for AWS Lambda (1024MB, us-east-1):
Runtime
Minimal Function
Typical App
Python 3.12
150ms
400-800ms
Node.js 20
120ms
300-600ms
Go
80ms
100-200ms
Java (no optimization)
800ms
2000-4000ms
Java (SnapStart)
100ms
200-400ms
.NET 8
200ms
400-800ms
Pattern 1: Provisioned Concurrency (The Nuclear Option)#
Keep instances warm permanently. Guaranteed no cold starts. Also guaranteed higher bills.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
resource"aws_lambda_function""api"{function_name="api-handler"runtime="python3.12"handler="main.handler"memory_size=1024 # ... other config
}resource"aws_lambda_provisioned_concurrency_config""api"{function_name=aws_lambda_function.api.function_nameprovisioned_concurrent_executions=10qualifier=aws_lambda_function.api.version}
# warmer.pyimportboto3importjsonlambda_client=boto3.client('lambda')FUNCTIONS_TO_WARM=['api-handler','user-service','payment-processor']defhandler(event,context):forfuncinFUNCTIONS_TO_WARM:# Invoke with special warming payloadlambda_client.invoke(FunctionName=func,InvocationType='Event',# AsyncPayload=json.dumps({'warming':True}))return{'warmed':len(FUNCTIONS_TO_WARM)}
Your actual function checks for warming requests:
1
2
3
4
5
6
7
defhandler(event,context):# Skip processing for warming pingsifevent.get('warming'):return{'statusCode':200,'body':'warm'}# Normal processingreturnprocess_request(event)
importconcurrent.futuresdefwarm_function(func_name,concurrency):"""Warm multiple instances in parallel"""withconcurrent.futures.ThreadPoolExecutor(max_workers=concurrency)asexecutor:futures=[executor.submit(lambda_client.invoke,FunctionName=func_name,InvocationType='RequestResponse',# Sync to hold instancePayload=json.dumps({'warming':True,'instance':i}))foriinrange(concurrency)]concurrent.futures.wait(futures)
# Build numpy/scipy for Lambda's Amazon Linux 2FROM public.ecr.aws/lambda/python:3.12RUN pip install numpy scipy -t /var/task/# Then copy /var/task/* to your deployment package
# Global but not initialized_dynamodb_table=None_s3_client=Nonedefget_table():global_dynamodb_tableif_dynamodb_tableisNone:importboto3dynamodb=boto3.resource('dynamodb')_dynamodb_table=dynamodb.Table('users')return_dynamodb_tabledefget_s3():global_s3_clientif_s3_clientisNone:importboto3_s3_client=boto3.client('s3')return_s3_clientdefhandler(event,context):ifevent['action']=='get_user':# Only initializes DynamoDBreturnget_table().get_item(Key={'id':event['id']})elifevent['action']=='get_file':# Only initializes S3returnget_s3().get_object(Bucket='my-bucket',Key=event['key'])
Caveat: If most requests need all resources, lazy loading just moves latency from cold start to first request. Profile to verify it helps.
Java with SnapStart — AWS caches initialized snapshots. Cuts cold starts by 90%.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// Enable in SAM template// SnapStart:// ApplyOn: PublishedVersionspublicclassHandlerimplementsRequestHandler<APIGatewayProxyRequestEvent,APIGatewayProxyResponseEvent>{// Static initialization still happens at snapshot timeprivatestaticfinalDynamoDbClientdynamoDb=DynamoDbClient.builder().build();@OverridepublicAPIGatewayProxyResponseEventhandleRequest(APIGatewayProxyRequestEventevent,Contextcontext){// Fast execution — state restored from snapshotreturnnewAPIGatewayProxyResponseEvent().withStatusCode(200).withBody("Hello");}}
Python with Lambda Web Adapter — Run FastAPI/Flask directly, keep connections warm:
1
2
3
4
5
6
7
8
9
10
11
FROM public.ecr.aws/lambda/python:3.12COPY requirements.txt .RUN pip install -r requirements.txtCOPY app.py .# Lambda Web Adapter handles HTTPCOPY --from=public.ecr.aws/awsguru/aws-lambda-adapter:0.7.1 /lambda-adapter /opt/extensions/lambda-adapterCMD["python","app.py"]
Counterintuitive: fewer functions can mean fewer cold starts.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# Instead of 10 functions for 10 endpoints,# one function with routingdefhandler(event,context):path=event['path']method=event['httpMethod']routes={('GET','/users'):list_users,('POST','/users'):create_user,('GET','/users/{id}'):get_user,('GET','/orders'):list_orders,# ... more routes}handler_func=routes.get((method,path))ifhandler_func:returnhandler_func(event)return{'statusCode':404}
Why this works:
One function gets more traffic = more warm instances
Shared initialization amortized across endpoints
Simpler deployment
Trade-off: Larger package size, longer individual cold start. Profile to see if the reduced cold start frequency outweighs the increased duration.
Measure first — Know your current cold start frequency and duration
Start with free — Package optimization, lazy loading, runtime selection
Add scheduled warming — Gets you 80% of the benefit at minimal cost
Use provisioned concurrency — Only for paths that justify the cost
Consider hybrid — ECS for hot paths, Lambda for everything else
Cold starts aren’t a bug; they’re a trade-off. You get scale-to-zero and pay-per-use. The cost is occasional latency spikes. The patterns above help you minimize that cost without giving up the benefits.
The goal isn’t zero cold starts. It’s cold starts that don’t matter.
📬 Get the Newsletter
Weekly insights on DevOps, automation, and CLI mastery. No spam, unsubscribe anytime.