Health Checks: Readiness, Liveness, and Startup Probes Explained

Your application says it’s running. But is it actually working? Health checks answer that question. They’re the difference between β€œprocess exists” and β€œservice is functional.” Get them wrong, and your orchestrator will either route traffic to broken instances or restart healthy ones. Three Types of Probes Liveness: β€œIs this process stuck?” Liveness probes detect deadlocks, infinite loops, and zombie processes. If liveness fails, the container gets killed and restarted. 1 2 3 4 5 6 7 livenessProbe: httpGet: path: /healthz port: 8080 initialDelaySeconds: 30 periodSeconds: 10 failureThreshold: 3 What to check: ...

February 16, 2026 Β· 6 min Β· 1131 words Β· Rob Washington

Graceful Shutdown: Zero-Downtime Deployments Done Right

Kill -9 is violence. Your application deserves a dignified death. Graceful shutdown means finishing in-flight work before terminating. Without it, deployments cause dropped requests, broken connections, and data corruption. With it, users never notice you restarted. The Problem When a process receives SIGTERM: Kubernetes/Docker sends the signal Your app has a grace period (default 30s) After the grace period, SIGKILL terminates forcefully If your app doesn’t handle SIGTERM, in-flight requests get dropped. Database transactions abort. WebSocket connections die mid-message. ...

February 16, 2026 Β· 6 min Β· 1202 words Β· Rob Washington

Graceful Shutdown: The Art of Dying Well in Production

Your container is about to die. It has 30 seconds to live. What happens next determines whether your users see a clean transition or a wall of 502 errors. Graceful shutdown is one of those things that seems obvious until you realize most applications do it wrong. The Problem When Kubernetes (or Docker, or systemd) decides to stop your application, it sends a SIGTERM signal. Your application has a grace periodβ€”usually 30 secondsβ€”to finish what it’s doing and exit cleanly. After that, it gets SIGKILL. No negotiation. ...

February 12, 2026 Β· 6 min Β· 1203 words Β· Rob Washington

Policy as Code: Enforcing Standards with OPA and Gatekeeper

Manual policy enforcement doesn’t scale. Security reviews become bottlenecks. Compliance audits are painful. Policy as code solves thisβ€”define policies once, enforce them everywhere, automatically. Open Policy Agent Basics OPA uses Rego, a declarative language for expressing policies. Simple Policy 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 # policy/authz.rego package authz default allow = false # Allow if user is admin allow { input.user.role == "admin" } # Allow if user owns the resource allow { input.user.id == input.resource.owner_id } # Allow read access to public resources allow { input.action == "read" input.resource.public == true } Test the Policy 1 2 3 4 5 6 7 8 9 10 # input.json { "user": {"id": "user-123", "role": "member"}, "resource": {"owner_id": "user-123", "public": false}, "action": "read" } # Run OPA opa eval -i input.json -d policy/ "data.authz.allow" # Result: true (user owns the resource) Policy Testing 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 # policy/authz_test.rego package authz test_admin_allowed { allow with input as { "user": {"role": "admin"}, "action": "delete", "resource": {"owner_id": "other"} } } test_owner_allowed { allow with input as { "user": {"id": "user-1", "role": "member"}, "action": "update", "resource": {"owner_id": "user-1"} } } test_non_owner_denied { not allow with input as { "user": {"id": "user-1", "role": "member"}, "action": "update", "resource": {"owner_id": "user-2", "public": false} } } 1 2 # Run tests opa test policy/ -v Kubernetes Gatekeeper Enforce policies on Kubernetes resources at admission time. ...

February 12, 2026 Β· 7 min Β· 1333 words Β· Rob Washington

Service Mesh: Traffic Management, Security, and Observability with Istio

When you have dozens of microservices talking to each other, managing traffic, security, and observability becomes complex. A service mesh handles this at the infrastructure layer, so your applications don’t have to. What Problems Does a Service Mesh Solve? Without a mesh, every service needs to implement: Retries and timeouts Circuit breakers Load balancing TLS certificates Metrics and tracing Access control With a mesh, the sidecar proxy handles all of this: ...

February 11, 2026 Β· 6 min Β· 1277 words Β· Rob Washington

Container Security: Hardening Your Docker Images and Kubernetes Deployments

Containers aren’t inherently secure. A default Docker image runs as root, includes unnecessary packages, and exposes more attack surface than needed. Let’s fix that. Secure Dockerfile Practices Use Minimal Base Images 1 2 3 4 5 6 7 8 # BAD: Full OS with unnecessary packages FROM ubuntu:latest # BETTER: Slim variant FROM python:3.11-slim # BEST: Distroless (no shell, no package manager) FROM gcr.io/distroless/python3-debian11 Size comparison: ubuntu:latest: ~77MB python:3.11-slim: ~45MB distroless/python3: ~16MB Smaller image = smaller attack surface. ...

February 11, 2026 Β· 5 min Β· 1030 words Β· Rob Washington

GitOps with ArgoCD: Your Infrastructure as Code, Actually

GitOps takes β€œInfrastructure as Code” literally: your Git repository becomes the single source of truth for what should be running. ArgoCD watches your repo and automatically synchronizes your cluster to match. No more kubectl apply from laptops, no more β€œwhat’s actually deployed?” mysteries. GitOps Principles Declarative: Describe the desired state, not the steps to get there Versioned: All changes go through Git (audit trail, rollback) Automated: Changes are applied automatically when Git changes Self-healing: Drift from desired state is automatically corrected Installing ArgoCD 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 # Create namespace kubectl create namespace argocd # Install ArgoCD kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml # Wait for pods kubectl wait --for=condition=Ready pods --all -n argocd --timeout=300s # Get initial admin password kubectl -n argocd get secret argocd-initial-admin-secret \ -o jsonpath="{.data.password}" | base64 -d # Port forward to access UI kubectl port-forward svc/argocd-server -n argocd 8080:443 Access the UI at https://localhost:8080 with username admin. ...

February 11, 2026 Β· 8 min Β· 1579 words Β· Rob Washington

Zero-Downtime Deployments: Blue-Green and Canary Strategies

Deploying new code shouldn’t mean crossing your fingers. Blue-green and canary deployments let you release changes with confidence, validate them with real traffic, and roll back in seconds if something goes wrong. Blue-Green Deployments Blue-green maintains two identical production environments. One serves traffic (blue), while the other stands ready (green). To deploy, you push to green, test it, then switch traffic over. B β”Œ β”‚ β”‚ β”‚ β”” β”Œ β”‚ β”‚ β”‚ β”” A β”Œ β”‚ β”‚ β”‚ β”” β”Œ β”‚ β”‚ β”‚ β”” e ─ ─ ─ ─ f ─ ─ ─ ─ f ─ ─ ─ ─ t ─ ─ ─ ─ o ─ ─ ─ S ─ e ─ S ─ ─ ─ r ─ A ─ ─ ( T ─ r ─ T ─ ─ A ─ e ─ B v C ─ ─ G i A ─ ─ B v A ─ ─ G v C ─ ─ l 1 T ─ ─ r d N ─ d ─ l 1 N ─ ─ r 1 T ─ d ─ u . I ─ ─ e l D ─ e ─ u . D ─ ─ e . I ─ e ─ e 0 V ─ ─ e e B ─ p ─ e 0 B ─ ─ e 1 V ─ p ─ ) E ─ ─ n ) Y ─ l ─ ) Y ─ ─ n ) E ─ l ─ ─ ─ ─ o ─ ─ ─ ─ o ─ ─ ─ ─ y ─ ─ ─ ─ y ─ ─ ─ ─ m ─ ─ ─ ─ m ─ ─ ─ ─ e ─ ─ ─ ─ e ┐ β”‚ β”‚ β”‚ β”˜ ┐ β”‚ β”‚ β”‚ β”˜ n ┐ β”‚ β”‚ β”‚ β”˜ ┐ β”‚ β”‚ β”‚ β”˜ n t t β—„ : β—„ : ─ ─ ─ ─ β”Œ β”‚ β”” β”Œ β”‚ β”” ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ R ─ ─ R ─ ─ o ─ ─ o ─ ─ u ─ ─ u ─ ─ t ─ ─ t ─ ─ e ─ ─ e ─ ─ r ─ ─ r ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ β”‚ β”˜ ┐ β”‚ β”˜ β—„ β—„ ─ ─ ─ ─ U U s s e e r r s s Kubernetes Implementation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 # blue-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: app-blue labels: app: myapp version: blue spec: replicas: 3 selector: matchLabels: app: myapp version: blue template: metadata: labels: app: myapp version: blue spec: containers: - name: app image: myapp:1.0.0 ports: - containerPort: 8080 --- # green-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: app-green labels: app: myapp version: green spec: replicas: 3 selector: matchLabels: app: myapp version: green template: metadata: labels: app: myapp version: green spec: containers: - name: app image: myapp:1.1.0 ports: - containerPort: 8080 --- # service.yaml - switch by changing selector apiVersion: v1 kind: Service metadata: name: myapp spec: selector: app: myapp version: blue # Change to 'green' to switch ports: - port: 80 targetPort: 8080 Deployment Script 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 #!/bin/bash set -e NEW_VERSION=$1 CURRENT=$(kubectl get svc myapp -o jsonpath='{.spec.selector.version}') if [ "$CURRENT" == "blue" ]; then TARGET="green" else TARGET="blue" fi echo "Current: $CURRENT, Deploying to: $TARGET" # Update the standby deployment kubectl set image deployment/app-$TARGET app=myapp:$NEW_VERSION # Wait for rollout kubectl rollout status deployment/app-$TARGET --timeout=300s # Run smoke tests against standby kubectl run smoke-test --rm -it --image=curlimages/curl \ --restart=Never -- curl -f http://app-$TARGET:8080/health # Switch traffic kubectl patch svc myapp -p "{\"spec\":{\"selector\":{\"version\":\"$TARGET\"}}}" echo "Switched to $TARGET (v$NEW_VERSION)" Instant Rollback 1 2 3 4 5 6 7 8 9 10 11 12 13 #!/bin/bash # rollback.sh - switch back to previous version CURRENT=$(kubectl get svc myapp -o jsonpath='{.spec.selector.version}') if [ "$CURRENT" == "blue" ]; then PREVIOUS="green" else PREVIOUS="blue" fi kubectl patch svc myapp -p "{\"spec\":{\"selector\":{\"version\":\"$PREVIOUS\"}}}" echo "Rolled back to $PREVIOUS" Canary Deployments Canary deployments route a small percentage of traffic to the new version, gradually increasing if metrics look good. ...

February 11, 2026 Β· 8 min Β· 1669 words Β· Rob Washington

Health Checks Done Right: Liveness, Readiness, and Startup Probes

A health check that always returns 200 OK is worse than no health check at all. It gives you false confidence while your application silently fails. Let’s build health checks that actually tell you when something’s wrong. The Three Types of Probes Kubernetes defines three probe types, each serving a distinct purpose: Liveness Probe: β€œIs this process stuck?” If it fails, Kubernetes kills and restarts the container. Readiness Probe: β€œCan this instance handle traffic?” If it fails, the instance is removed from load balancing but keeps running. ...

February 11, 2026 Β· 6 min Β· 1174 words Β· Rob Washington

Kubernetes: Container Orchestration for the Rest of Us

A practical introduction to Kubernetes β€” what it does, why you might need it, and how to get started without drowning in complexity.

February 10, 2026 Β· 5 min Β· 967 words Β· Rob Washington