Graceful Shutdown: Don't Drop Requests on Deploy

Your deploy shouldn’t kill requests mid-flight. Every dropped connection is a failed payment, a lost form submission, or a frustrated user. Graceful shutdown ensures your application finishes what it started before dying.

Here’s how to do it right.

The Problem

Without graceful shutdown:

With graceful shutdown:

Signal Handling 101

Unix signals tell your process to do something. The important ones:

Signal	Default	Meaning
`SIGTERM`	Terminate	“Please shut down gracefully”
`SIGINT`	Terminate	Ctrl+C, “Please stop”
`SIGKILL`	Kill	“Die immediately” (can’t catch)
`SIGHUP`	Terminate	“Terminal closed” or “reload config”

Kubernetes, Docker, and most orchestrators send SIGTERM first, wait, then send SIGKILL.

Python Implementation

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
import signal
import sys
import threading
from http.server import HTTPServer
import time

# Track in-flight requests
active_requests = threading.Semaphore(100)  # Max concurrent
shutdown_event = threading.Event()

class GracefulServer:
    def __init__(self, server):
        self.server = server
        self.setup_signal_handlers()
    
    def setup_signal_handlers(self):
        signal.signal(signal.SIGTERM, self.handle_shutdown)
        signal.signal(signal.SIGINT, self.handle_shutdown)
    
    def handle_shutdown(self, signum, frame):
        print(f"Received signal {signum}, starting graceful shutdown...")
        shutdown_event.set()
        
        # Stop accepting new connections
        self.server.shutdown()
        
        # Wait for in-flight requests (max 30 seconds)
        deadline = time.time() + 30
        while active_requests._value < 100 and time.time() < deadline:
            time.sleep(0.1)
        
        remaining = 100 - active_requests._value
        if remaining > 0:
            print(f"Warning: {remaining} requests still in flight at shutdown")
        
        sys.exit(0)

# Decorator for request handlers
def tracked_request(func):
    def wrapper(*args, **kwargs):
        if shutdown_event.is_set():
            return {"error": "Server shutting down"}, 503
        
        acquired = active_requests.acquire(blocking=False)
        if not acquired:
            return {"error": "Too many requests"}, 429
        
        try:
            return func(*args, **kwargs)
        finally:
            active_requests.release()
    
    return wrapper

Node.js Implementation

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
const http = require('http');

let isShuttingDown = false;
let activeConnections = new Set();

const server = http.createServer((req, res) => {
  if (isShuttingDown) {
    res.writeHead(503);
    res.end('Server is shutting down');
    return;
  }
  
  // Track connection
  activeConnections.add(res);
  res.on('finish', () => activeConnections.delete(res));
  
  // Handle request
  handleRequest(req, res);
});

function gracefulShutdown(signal) {
  console.log(`Received ${signal}, starting graceful shutdown...`);
  isShuttingDown = true;
  
  // Stop accepting new connections
  server.close(() => {
    console.log('Server closed, no new connections');
  });
  
  // Wait for existing connections to finish
  const shutdownTimeout = setTimeout(() => {
    console.log(`Forcing shutdown with ${activeConnections.size} connections remaining`);
    process.exit(1);
  }, 30000);
  
  // Check periodically if all connections are done
  const checkInterval = setInterval(() => {
    if (activeConnections.size === 0) {
      clearInterval(checkInterval);
      clearTimeout(shutdownTimeout);
      console.log('All connections closed, exiting cleanly');
      process.exit(0);
    }
  }, 100);
}

process.on('SIGTERM', () => gracefulShutdown('SIGTERM'));
process.on('SIGINT', () => gracefulShutdown('SIGINT'));

server.listen(8080);

Go Implementation

Go’s http.Server has built-in graceful shutdown:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
package main

import (
    "context"
    "log"
    "net/http"
    "os"
    "os/signal"
    "syscall"
    "time"
)

func main() {
    server := &http.Server{
        Addr:    ":8080",
        Handler: http.HandlerFunc(handler),
    }
    
    // Channel to listen for shutdown signals
    shutdown := make(chan os.Signal, 1)
    signal.Notify(shutdown, syscall.SIGTERM, syscall.SIGINT)
    
    // Start server in goroutine
    go func() {
        log.Println("Server starting on :8080")
        if err := server.ListenAndServe(); err != http.ErrServerClosed {
            log.Fatalf("Server error: %v", err)
        }
    }()
    
    // Wait for shutdown signal
    sig := <-shutdown
    log.Printf("Received %v, starting graceful shutdown...", sig)
    
    // Create deadline context
    ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
    defer cancel()
    
    // Shutdown gracefully
    if err := server.Shutdown(ctx); err != nil {
        log.Printf("Graceful shutdown failed: %v", err)
        server.Close()  // Force close
    }
    
    log.Println("Server stopped")
}

func handler(w http.ResponseWriter, r *http.Request) {
    // Simulate work
    time.Sleep(2 * time.Second)
    w.Write([]byte("OK"))
}

Kubernetes Coordination

Kubernetes sends SIGTERM, waits terminationGracePeriodSeconds (default 30), then SIGKILL. But there’s a race condition: the pod might still receive traffic after SIGTERM.

The Problem

The Solution: PreStop Hook

Add a delay before shutdown to let endpoints update:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      terminationGracePeriodSeconds: 60
      containers:
      - name: app
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh", "-c", "sleep 10"]
        # Or use an HTTP endpoint
        # preStop:
        #   httpGet:
        #     path: /prestop
        #     port: 8080

Timeline with preStop:

Health Check Coordination

Fail your readiness probe during shutdown to stop traffic faster:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
shutdown_requested = False

@app.route('/health/ready')
def readiness():
    if shutdown_requested:
        return 'Shutting down', 503
    
    # Other readiness checks...
    return 'OK', 200

def handle_sigterm(signum, frame):
    global shutdown_requested
    shutdown_requested = True
    # Continue handling in-flight requests...

Connection Draining

For load balancers outside Kubernetes:

AWS ALB

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
resource "aws_lb_target_group" "app" {
  # ...
  
  deregistration_delay = 30  # Seconds to drain connections
  
  health_check {
    path                = "/health"
    healthy_threshold   = 2
    unhealthy_threshold = 2
    interval            = 10
  }
}

Nginx

1
2
3
4
5
6
7
upstream backend {
    server 10.0.0.1:8080;
    server 10.0.0.2:8080;
}

# During deploy, mark server as "down" and wait
# server 10.0.0.1:8080 down;

HAProxy

1
2
3
4
5
6
7
backend app
    option httpchk GET /health
    server app1 10.0.0.1:8080 check
    server app2 10.0.0.2:8080 check
    
    # Drain connections over 30 seconds when server removed
    default-server inter 3s fall 3 rise 2

Database and Queue Cleanup

Don’t just handle HTTP — clean up all resources:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
import signal
import sys

class Application:
    def __init__(self):
        self.db_pool = create_db_pool()
        self.redis = create_redis_client()
        self.background_workers = []
    
    def start_background_workers(self):
        for i in range(4):
            worker = BackgroundWorker()
            worker.start()
            self.background_workers.append(worker)
    
    def shutdown(self, signum, frame):
        print("Shutdown initiated...")
        
        # 1. Stop accepting new work
        self.http_server.stop_accepting()
        
        # 2. Stop background workers (let them finish current job)
        for worker in self.background_workers:
            worker.stop_gracefully()
        
        # 3. Wait for HTTP requests to drain
        self.http_server.wait_for_drain(timeout=15)
        
        # 4. Wait for background workers
        for worker in self.background_workers:
            worker.join(timeout=10)
        
        # 5. Close database connections
        self.db_pool.close()
        
        # 6. Close Redis
        self.redis.close()
        
        # 7. Flush any buffered logs/metrics
        logging.shutdown()
        
        print("Clean shutdown complete")
        sys.exit(0)

app = Application()
signal.signal(signal.SIGTERM, app.shutdown)

Testing Graceful Shutdown

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#!/bin/bash
# test-graceful-shutdown.sh

# Start server
./myserver &
SERVER_PID=$!
sleep 2

# Start a slow request in background
curl -s "http://localhost:8080/slow?delay=5" &
CURL_PID=$!

# Wait for request to start
sleep 1

# Send SIGTERM
kill -TERM $SERVER_PID

# Wait for curl to complete
wait $CURL_PID
CURL_EXIT=$?

# Check result
if [ $CURL_EXIT -eq 0 ]; then
    echo "✓ Request completed successfully during shutdown"
else
    echo "✗ Request failed during shutdown (exit code: $CURL_EXIT)"
    exit 1
fi

The Checklist

Handle SIGTERM — Don’t rely on SIGKILL
Stop accepting new connections — Return 503 immediately
Wait for in-flight requests — With a reasonable timeout
Coordinate with load balancers — PreStop hooks, health checks
Clean up resources — Database connections, file handles, workers
Set appropriate timeouts — terminationGracePeriodSeconds > your drain time
Test it — Actually verify requests complete during deploys

Graceful shutdown is invisible when it works. Users never see the deploys. That’s the goal: infrastructure that serves reliability, not the other way around.

Deploy with confidence. Shut down with grace.

The Problem#

Signal Handling 101#

Python Implementation#

Node.js Implementation#

Go Implementation#

Kubernetes Coordination#

The Problem#

The Solution: PreStop Hook#

Health Check Coordination#

Connection Draining#

AWS ALB#

Nginx#

HAProxy#

Database and Queue Cleanup#

Testing Graceful Shutdown#

The Checklist#

📬 Get the Newsletter

The Problem

Signal Handling 101

Python Implementation

Node.js Implementation

Go Implementation

Kubernetes Coordination

The Problem

The Solution: PreStop Hook

Health Check Coordination

Connection Draining

AWS ALB

Nginx

HAProxy

Database and Queue Cleanup

Testing Graceful Shutdown

The Checklist