Every deployment is a potential outage if your application doesn’t shut down gracefully. Here’s how to do it right.

The Problem

12345.....KPYIUuoonsbdu-eerfrrilsnsaiepgstrpheeetesmeoxresviereetqrndsuodersfissrmtSomsdImeuGdgrTsieiEeatnRrtgMvecilo"cynzeneerecont-didpooonwinnrtteissmeet"deploys

The fix: handle SIGTERM, finish existing work, then exit.

Python Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
import signal
import sys
from threading import Event

shutdown_event = Event()

def handle_sigterm(signum, frame):
    print("SIGTERM received, shutting down gracefully...")
    shutdown_event.set()

signal.signal(signal.SIGTERM, handle_sigterm)

# In your main loop or server
while not shutdown_event.is_set():
    process_next_item()

# Cleanup
finish_pending_work()
close_connections()
sys.exit(0)

FastAPI/Uvicorn

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
from fastapi import FastAPI
from contextlib import asynccontextmanager

@asynccontextmanager
async def lifespan(app: FastAPI):
    # Startup
    await init_db_pool()
    yield
    # Shutdown (runs on SIGTERM)
    await close_db_pool()
    await flush_metrics()

app = FastAPI(lifespan=lifespan)

Run with:

1
uvicorn app:app --timeout-graceful-shutdown 30

Node.js

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
const server = app.listen(3000);

const shutdown = async () => {
  console.log('SIGTERM received, shutting down...');
  
  // Stop accepting new connections
  server.close(() => {
    console.log('HTTP server closed');
  });
  
  // Wait for existing requests (with timeout)
  await new Promise(resolve => setTimeout(resolve, 10000));
  
  // Cleanup
  await db.close();
  process.exit(0);
};

process.on('SIGTERM', shutdown);
process.on('SIGINT', shutdown);

Go

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
func main() {
    srv := &http.Server{Addr: ":8080"}
    
    go func() {
        if err := srv.ListenAndServe(); err != http.ErrServerClosed {
            log.Fatal(err)
        }
    }()
    
    // Wait for SIGTERM
    quit := make(chan os.Signal, 1)
    signal.Notify(quit, syscall.SIGTERM, syscall.SIGINT)
    <-quit
    
    // Graceful shutdown with timeout
    ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
    defer cancel()
    
    if err := srv.Shutdown(ctx); err != nil {
        log.Printf("Server forced to shutdown: %v", err)
    }
}

Kubernetes Configuration

1
2
3
4
5
6
7
8
spec:
  containers:
    - name: app
      lifecycle:
        preStop:
          exec:
            command: ["sleep", "5"]  # Wait for endpoint removal
      terminationGracePeriodSeconds: 30

The preStop sleep is crucial. There’s a race between:

  1. Pod receiving SIGTERM
  2. Endpoints controller removing pod from service

The sleep ensures traffic stops arriving before your app starts refusing connections.

The Timeline

000~~552233sss23ss5500ssssss-----------SpEPkpAACtSIrnourppoeIGeddbeppnrGTSpeStmKEtor-trfaiIRoiepoeiinLMpnmrpcnnaLtooeietshsvxcisriieoeyovhofnocdmeeentkoupssxGsnfplirtrtrdeSgtaiuroatIrsclnomteGaelslesTcPlssEeer(eeRfrusrriMuinlvp,lonenitldieocasySnptebtegilac5feero)instnedsddpsosihdnuettasddolwinne

Health Checks During Shutdown

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
shutdown_in_progress = False

@app.get("/health/ready")
def readiness():
    if shutdown_in_progress:
        return Response(status_code=503)
    return {"status": "ok"}

def handle_sigterm():
    global shutdown_in_progress
    shutdown_in_progress = True
    # Continue serving existing requests
    # But readiness probe fails, so no new traffic

Kubernetes stops sending traffic when readiness fails.

Database Connections

1
2
3
4
5
6
7
8
9
async def shutdown():
    # 1. Stop accepting new work
    shutdown_flag.set()
    
    # 2. Wait for in-flight queries
    await asyncio.sleep(5)
    
    # 3. Close connection pool
    await db_pool.close()

Don’t close the pool before queries finish.

Queue Workers

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
def worker():
    while True:
        if shutdown_event.is_set():
            # Finish current job, don't take new ones
            break
        
        job = queue.get(timeout=1)
        if job:
            process(job)
            queue.task_done()
    
    # Acknowledge we're done
    queue.join()

Testing Graceful Shutdown

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# Start your app
./myapp &
APP_PID=$!

# Start a slow request
curl http://localhost:8080/slow &

# Send SIGTERM
kill -TERM $APP_PID

# Did the slow request complete?
wait

If the request completes with a valid response, you’re handling shutdown correctly.

The Checklist

  1. ☐ Handle SIGTERM signal
  2. ☐ Stop accepting new connections
  3. ☐ Wait for in-flight requests to complete
  4. ☐ Close database connections cleanly
  5. ☐ Flush metrics/logs
  6. ☐ Add preStop hook in Kubernetes
  7. ☐ Set appropriate terminationGracePeriodSeconds
  8. ☐ Fail readiness probe during shutdown

Zero-downtime deployments require graceful shutdowns. No exceptions.