Kubernetes

Graceful Shutdown: Finishing What You Started

When Kubernetes scales down your deployment or you push a new release, your running containers receive SIGTERM. Then, after a grace period, SIGKILL. The difference between graceful and chaotic shutdown is what happens in those seconds between the two signals. A request half-processed, a database transaction uncommitted, a file partially written — these are the artifacts of ungraceful shutdown. They create inconsistent state, failed requests, and debugging nightmares. The Signal Sequence 1 2 3 4 5 6 7 . . . . . . . S G P P P P I I r r r r r f G a o o o o T c c c c c s E e e e e e t R s s s s i M p s s s s l e l s r s s s e e i h h h x r n o o o o i u t d u u u t n l l l s n t c d d d i o o w n u s f c i g p n t i l t : r t o n o h o d p i s S c o s e c I e w a h o G s n c c d K s c i o e I b e n n L e p - n 0 L g t f e i i l c ( n n i t n s g g i o h o ( n t n m d e s e e w w r f o c c a w r l y u o k e ) l r a t k n : l y 3 0 s i n K u b e r n e t e s ) Basic Signal Handling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 import signal import sys shutdown_requested = False def handle_sigterm(signum, frame): global shutdown_requested print("SIGTERM received, initiating graceful shutdown...") shutdown_requested = True signal.signal(signal.SIGTERM, handle_sigterm) signal.signal(signal.SIGINT, handle_sigterm) # Ctrl+C # Main loop checks shutdown flag while not shutdown_requested: process_next_item() # Cleanup after loop exits cleanup() sys.exit(0) Web Servers: Stop Accepting, Finish Processing Most web frameworks have built-in graceful shutdown. The pattern: ...

Health Check Endpoints: More Than Just 200 OK

Every container orchestrator, load balancer, and monitoring system asks the same question: is this service healthy? The answer you provide determines whether traffic gets routed, containers get replaced, and alerts get fired. A health check that lies — always returning 200 even when the database is down — is worse than no health check at all. It creates false confidence while your users experience failures. The Three Types of Health Checks Liveness: “Is the process alive?” Liveness checks answer: should this container be killed and restarted? ...

GitOps Workflow Patterns: Infrastructure as Pull Requests

GitOps sounds simple: put your infrastructure in Git, let a controller sync it to your cluster. In practice, there are a dozen ways to get it wrong. Here’s what works. The Core Principle Git is the source of truth. Not the cluster. Not a dashboard. Not someone’s kubectl session. D e v e l o p e r s → i n G g i l t ↑ e → s o C u o r n c t e r o l l e r → C l u s t e r If the cluster state doesn’t match Git, the controller fixes it. If someone manually changes the cluster, the controller reverts it. This is the contract. ...

Zero-Downtime Deployments: Strategies That Actually Work

“We’re deploying, please hold” is not an acceptable user experience. Whether you’re running a startup or enterprise infrastructure, users expect services to just work. Here’s how to ship code without the maintenance windows. The Goal: Invisible Deploys A zero-downtime deployment means users never notice you’re deploying. No error pages, no dropped connections, no “please refresh” messages. The old version serves traffic until the new version is proven healthy. Strategy 1: Rolling Deployments The simplest approach. Replace instances one at a time: ...

Health Check Patterns: Liveness, Readiness, and Startup Probes

Your load balancer routes traffic to a pod that’s crashed. Or kills a pod that’s just slow. Or restarts a pod that’s still initializing. Health checks prevent these failures — when configured correctly. Most teams get them wrong. Here’s how to get them right. The Three Probe Types Kubernetes offers three distinct probes, each with a different purpose: Probe Question Failure Action Liveness Is the process alive? Restart container Readiness Can it handle traffic? Remove from Service Startup Has it finished starting? Delay other probes Liveness: “Should I restart this?” Detects when your app is stuck — deadlocked, infinite loop, unrecoverable state. ...

Kubernetes Resource Limits: Right-Sizing Containers for Stability

Your pod got OOMKilled. Or throttled to 5% CPU. Or evicted because the node ran out of resources. The fix isn’t “add more resources” — it’s understanding how Kubernetes scheduling actually works. Requests vs Limits Requests: What you’re guaranteed. Kubernetes uses this for scheduling. Limits: The ceiling. Exceed this and bad things happen. 1 2 3 4 5 6 7 resources: requests: memory: "256Mi" cpu: "250m" # 0.25 cores limits: memory: "512Mi" cpu: "500m" # 0.5 cores What Happens When You Exceed Them Resource Exceed Request Exceed Limit CPU Throttled when node is busy Hard throttled always Memory Fine if available OOMKilled immediately CPU is compressible — you slow down but survive. Memory is not — you die. ...

Graceful Shutdown: Don't Drop Requests on Deploy

Your deploy shouldn’t kill requests mid-flight. Every dropped connection is a failed payment, a lost form submission, or a frustrated user. Graceful shutdown ensures your application finishes what it started before dying. Here’s how to do it right. The Problem Without graceful shutdown: 1 1 1 1 1 3 3 3 3 3 : : : : : 0 0 0 0 0 0 0 0 0 0 : : : : : 0 0 0 0 0 0 1 1 1 1 - - - - - R D P C U e e r l s q p o i e u l c e r e o e n s y s t s t s e s g e s i k e s t g i t a n l s e r a l r t l e c r s d o o r n r ( e i n , e c m e x e m c r p i e t e e v d i t c e i o r t d a n i e t e d e r s l e , 2 y s e m s t a e y c b o e n d g i r v e e s s p o u n p s e ) With graceful shutdown: ...

Container Security Best Practices: Hardening Your Docker Images

A container is only as secure as its weakest layer. Most security breaches don’t exploit exotic vulnerabilities — they walk through doors left open by default configurations, bloated images, and running as root. Here’s how to actually secure your containers. Start with Minimal Base Images Every package in your image is attack surface. Alpine Linux images are ~5MB compared to Ubuntu’s ~70MB. Fewer packages means fewer CVEs to patch. 1 2 3 4 5 6 7 8 9 10 11 12 # ❌ Don't do this FROM ubuntu:latest RUN apt-get update && apt-get install -y python3 python3-pip COPY . /app CMD ["python3", "/app/main.py"] # ✅ Do this FROM python:3.12-alpine COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY . /app CMD ["python3", "/app/main.py"] For compiled languages, use multi-stage builds to ship only the binary: ...

GitOps Workflows: Infrastructure Changes Through Pull Requests

Git isn’t just for code anymore. In a GitOps workflow, your entire infrastructure lives in version control, and changes happen through pull requests, not SSH sessions. The principle is simple: the desired state of your system is declared in Git, and automated processes continuously reconcile actual state with desired state. No more “just SSH in and fix it.” No more tribal knowledge about what’s running where. The Core Loop GitOps operates on a continuous reconciliation loop: ...

Blue-Green Deployments: Zero-Downtime Releases with Instant Rollback

What if you could deploy with a safety net? Blue-green deployments give you exactly that: two identical production environments, one serving traffic while the other waits in the wings. Deploy to the idle environment, test it, then switch traffic instantly. If something breaks, switch back. No rollback procedure—just flip. The Core Concept You maintain two identical environments: Blue: Currently serving production traffic Green: Idle, ready for the next release Deployment flow: ...