Devops

Environment Configuration Patterns: From Dev to Production

Configuration management sounds simple until you’re debugging why production is reading from the staging database at 3am. Here’s how to structure configuration so environments stay isolated and secrets stay secret. The Twelve-Factor Baseline The Twelve-Factor App got it right: store config in environment variables. But that’s just the starting point. Real systems need layers. ┌ │ ├ │ ├ │ ├ │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ R ─ S ─ E ─ D ─ ─ u ─ e ─ n ─ e ─ ─ n ─ c ─ v ─ f ─ ─ t ─ r ─ i ─ a ─ ─ i ─ e ─ r ─ u ─ ─ m ─ t ─ o ─ l ─ ─ e ─ s ─ n ─ t ─ ─ ─ ─ m ─ ─ ─ E ─ M ─ e ─ c ─ ─ n ─ a ─ n ─ o ─ ─ v ─ n ─ t ─ n ─ ─ i ─ a ─ - ─ f ─ ─ r ─ g ─ s ─ i ─ ─ o ─ e ─ p ─ g ─ ─ n ─ r ─ e ─ ─ ─ m ─ ─ c ─ i ─ ─ e ─ ( ─ i ─ n ─ ─ n ─ V ─ f ─ ─ ─ t ─ a ─ i ─ c ─ ─ ─ u ─ c ─ o ─ ─ V ─ l ─ ─ d ─ ─ a ─ t ─ c ─ e ─ ─ r ─ , ─ o ─ ─ ─ i ─ ─ n ─ ─ ─ a ─ A ─ f ─ ─ ─ b ─ W ─ i ─ ─ ─ l ─ S ─ g ─ ─ ─ e ─ ─ ─ ─ ─ s ─ S ─ f ─ ─ ─ ─ S ─ i ─ ─ ─ ─ M ─ l ─ ─ ─ ─ , ─ e ─ ─ ─ ─ ─ s ─ ─ ─ ─ e ─ ─ ─ ─ ─ t ─ ─ ─ ─ ─ c ─ ─ ─ ─ ─ . ─ ─ ─ ─ ─ ) ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ ┤ │ ┤ │ ┤ │ ┘ ← ← H L i o g w h e e s s t t p p r r i i o o r r i i t t y y Each layer overrides the one below it. Defaults live in code, environment-specific values in config files, secrets in a secrets manager, and runtime overrides in environment variables. ...

Ansible Role Design Patterns: Building Reusable Automation

Ansible playbooks get messy fast. Roles fix this by packaging related tasks, variables, and handlers into reusable units. But bad roles are worse than no roles — they hide complexity without managing it. Here’s how to design roles that actually help. The Basic Structure r o l m e y s _ t h d v t f m / r a a e a e i e o s m n m f m r m m c l s t m l k a d a a a s a p o e c a a e s i l i u i / i l n s r / i / / n e n l n n a f / i n . r . t . . t i p . y s y s y y e g t y m / m / m m s . . m l l l l j s l 2 h # # # # # # # E E D R J S R n v e o i t o t e f l n a l r n a e j t e y t u a i - l v 2 c m p t t a e o r r t f t i i v i e i a n g a a m l d t g r b p e a e i l l s t r a e a a e b s t d l e a e ( s n t s h d a i s ( g d k l h e s o e p w r e e n s p d t r e e n p c c r e i e d e c e s e n d c e e n ) c e ) Only create directories you need. An empty handlers/ is noise. ...

LLM Prompt Engineering for DevOps Automation

LLMs are becoming infrastructure. Not just chatbots — actual components in automation pipelines. But getting reliable, parseable output requires disciplined prompt engineering. Here’s what works for DevOps use cases. The Core Problem LLMs are probabilistic. Ask the same question twice, get different answers. That’s fine for chat. It’s terrible for automation that needs to parse structured output. The solution: constrain the output format and validate aggressively. Pattern 1: Structured Output with JSON Mode Most LLM APIs now support JSON mode. Use it. ...

Health Check Patterns: Liveness, Readiness, and Startup Probes

Your load balancer routes traffic to a pod that’s crashed. Or kills a pod that’s just slow. Or restarts a pod that’s still initializing. Health checks prevent these failures — when configured correctly. Most teams get them wrong. Here’s how to get them right. The Three Probe Types Kubernetes offers three distinct probes, each with a different purpose: Probe Question Failure Action Liveness Is the process alive? Restart container Readiness Can it handle traffic? Remove from Service Startup Has it finished starting? Delay other probes Liveness: “Should I restart this?” Detects when your app is stuck — deadlocked, infinite loop, unrecoverable state. ...

Kubernetes Resource Limits: Right-Sizing Containers for Stability

Your pod got OOMKilled. Or throttled to 5% CPU. Or evicted because the node ran out of resources. The fix isn’t “add more resources” — it’s understanding how Kubernetes scheduling actually works. Requests vs Limits Requests: What you’re guaranteed. Kubernetes uses this for scheduling. Limits: The ceiling. Exceed this and bad things happen. 1 2 3 4 5 6 7 resources: requests: memory: "256Mi" cpu: "250m" # 0.25 cores limits: memory: "512Mi" cpu: "500m" # 0.5 cores What Happens When You Exceed Them Resource Exceed Request Exceed Limit CPU Throttled when node is busy Hard throttled always Memory Fine if available OOMKilled immediately CPU is compressible — you slow down but survive. Memory is not — you die. ...

Secrets Rotation Automation: Stop Letting Credentials Rot

That database password hasn’t changed in three years. The API key in your config was committed by someone who left two jobs ago. The SSL certificate expires next Tuesday and nobody knows. Secrets rot. Rotation automation fixes this. Why Rotate? Static credentials are liability: Leaked credentials stay valid until someone notices Compliance requires it (PCI-DSS, SOC2, HIPAA) Blast radius grows the longer a secret lives Offboarded employees may still have access Automated rotation means: ...

Graceful Shutdown: Don't Drop Requests on Deploy

Your deploy shouldn’t kill requests mid-flight. Every dropped connection is a failed payment, a lost form submission, or a frustrated user. Graceful shutdown ensures your application finishes what it started before dying. Here’s how to do it right. The Problem Without graceful shutdown: 1 1 1 1 1 3 3 3 3 3 : : : : : 0 0 0 0 0 0 0 0 0 0 : : : : : 0 0 0 0 0 0 1 1 1 1 - - - - - R D P C U e e r l s q p o i e u l c e r e o e n s y s t s t s e s g e s i k e s t g i t a n l s e r a l r t l e c r s d o o r n r ( e i n , e c m e x e m c r p i e t e e v d i t c e i o r t d a n i e t e d e r s l e , 2 y s e m s t a e y c b o e n d g i r v e e s s p o u n p s e ) With graceful shutdown: ...

Feature Flags for Progressive Delivery: Beyond Simple Toggles

Feature flags started as if (ENABLE_NEW_UI) { ... }. They’ve evolved into a deployment strategy that separates code deployment from feature release. Ship your code Tuesday. Release to 1% of users Wednesday. Roll back without deploying Thursday. Here’s how to implement feature flags that scale from simple toggles to sophisticated progressive delivery. The Basic Pattern At its core, a feature flag is a runtime conditional: 1 2 3 4 5 def get_recommendations(user_id: str) -> list: if feature_flags.is_enabled("new_recommendation_algo", user_id): return new_algorithm(user_id) else: return legacy_algorithm(user_id) The magic is in how is_enabled works — and how you manage the flag lifecycle. ...

Container Security Best Practices: Hardening Your Docker Images

A container is only as secure as its weakest layer. Most security breaches don’t exploit exotic vulnerabilities — they walk through doors left open by default configurations, bloated images, and running as root. Here’s how to actually secure your containers. Start with Minimal Base Images Every package in your image is attack surface. Alpine Linux images are ~5MB compared to Ubuntu’s ~70MB. Fewer packages means fewer CVEs to patch. 1 2 3 4 5 6 7 8 9 10 11 12 # ❌ Don't do this FROM ubuntu:latest RUN apt-get update && apt-get install -y python3 python3-pip COPY . /app CMD ["python3", "/app/main.py"] # ✅ Do this FROM python:3.12-alpine COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY . /app CMD ["python3", "/app/main.py"] For compiled languages, use multi-stage builds to ship only the binary: ...

GitOps Workflows: Infrastructure Changes Through Pull Requests

Git isn’t just for code anymore. In a GitOps workflow, your entire infrastructure lives in version control, and changes happen through pull requests, not SSH sessions. The principle is simple: the desired state of your system is declared in Git, and automated processes continuously reconcile actual state with desired state. No more “just SSH in and fix it.” No more tribal knowledge about what’s running where. The Core Loop GitOps operates on a continuous reconciliation loop: ...