Ansible Playbook Patterns: Writing Automation That Doesn't Break

Ansible’s simplicity is seductive. YAML tasks, SSH connections, no agents. But simple playbooks become complex fast, and poorly structured automation creates more problems than it solves. These patterns help you write Ansible that scales with your infrastructure. Idempotency: Safe to Run Twice Every task should be safe to run repeatedly with the same result: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 # Idempotent - creates file if missing, no-op if exists - name: Create config directory file: path: /etc/myapp state: directory mode: '0755' # Not idempotent - appends every run - name: Add config line shell: echo "setting=value" >> /etc/myapp/config # Idempotent version - name: Add config line lineinfile: path: /etc/myapp/config line: "setting=value" Use Ansible modules over shell commands. Modules are designed for idempotency. ...

February 23, 2026 Â· 6 min Â· 1235 words Â· Rob Washington

Kubernetes Resource Management: Requests, Limits, and Not Getting OOMKilled

Kubernetes needs to know how much CPU and memory your containers need. Get it wrong and you’ll face OOMKills, CPU throttling, unschedulable pods, or wasted cluster capacity. Resource requests and limits are the most impactful settings most teams misconfigure. Requests vs Limits Requests: What you’re guaranteed. Used for scheduling. Limits: What you can’t exceed. Enforced at runtime. 1 2 3 4 5 6 7 resources: requests: memory: "256Mi" cpu: "250m" limits: memory: "512Mi" cpu: "500m" This pod: ...

February 23, 2026 Â· 5 min Â· 944 words Â· Rob Washington

CI/CD Pipelines: From Commit to Production Safely

Continuous Integration and Continuous Deployment transform code changes into running software automatically. Done well, you push code and forget about it — the pipeline handles testing, building, and deploying. Done poorly, you spend more time fighting the pipeline than writing code. The Pipeline Stages C o m m i t → B u i l d → T e s t → S e c u r i t y → A r t i f a c t → D e p l o y → V e r i f y Each stage is a gate. Fail any stage, stop the pipeline. ...

February 23, 2026 Â· 6 min Â· 1182 words Â· Rob Washington

Observability: Logs, Metrics, and Traces Working Together

Monitoring answers “is it working?” Observability answers “why isn’t it working?” The difference matters when you’re debugging a production incident at 3am. The three pillars of observability — logs, metrics, and traces — each provide different perspectives. Together, they create a complete picture of system behavior. Logs: The Narrative Logs tell you what happened, in order: 1 2 3 {"timestamp": "2026-02-23T13:00:01Z", "level": "info", "event": "request_started", "request_id": "abc123", "path": "/api/users"} {"timestamp": "2026-02-23T13:00:01Z", "level": "info", "event": "db_query", "request_id": "abc123", "duration_ms": 45} {"timestamp": "2026-02-23T13:00:02Z", "level": "error", "event": "request_failed", "request_id": "abc123", "error": "connection timeout"} Good for: Debugging specific requests Understanding error context Audit trails Ad-hoc investigation Challenges: ...

February 23, 2026 Â· 6 min Â· 1172 words Â· Rob Washington

Infrastructure as Code: Principles That Actually Matter

Infrastructure as Code (IaC) means your servers, networks, and services are defined in version-controlled files rather than clicked into existence through consoles. The benefits are obvious: reproducibility, auditability, collaboration. But IaC done poorly creates its own problems: state drift, copy-paste sprawl, untestable configurations. The principles matter more than the tools. Declarative Over Imperative Describe what you want, not how to get there: 1 2 3 4 5 6 7 8 9 # Declarative (Terraform) - what resource "aws_instance" "web" { ami = "ami-0c55b159cbfafe1f0" instance_type = "t3.micro" tags = { Name = "web-server" } } 1 2 3 4 5 # Imperative (script) - how aws ec2 run-instances \ --image-id ami-0c55b159cbfafe1f0 \ --instance-type t3.micro \ --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=web-server}]' Declarative code is idempotent — run it ten times, get the same result. Imperative scripts need guards against re-running. ...

February 23, 2026 Â· 6 min Â· 1251 words Â· Rob Washington

Docker Multi-Stage Builds: Smaller Images, Cleaner Dockerfiles

A typical Go application compiles to a single binary. Yet Docker images for Go apps often weigh hundreds of megabytes. Why? Because the image includes the entire Go toolchain used to build it. Multi-stage builds solve this: use one stage to build, another to run. The final image contains only what’s needed at runtime. The Problem: Fat Images 1 2 3 4 5 6 7 8 # Single-stage - includes everything FROM golang:1.21 WORKDIR /app COPY . . RUN go build -o server . CMD ["./server"] This image includes: ...

February 23, 2026 Â· 5 min Â· 1034 words Â· Rob Washington

Service Discovery: Finding Services in a Dynamic World

In static infrastructure, services live at known addresses. Database at 10.0.1.5, cache at 10.0.1.6. Simple, predictable, fragile. In dynamic infrastructure — containers, auto-scaling, cloud — services appear and disappear constantly. IP addresses change. Instances multiply and vanish. Hardcoded addresses become a liability. Service discovery solves this: how do services find each other when everything is moving? The Problem 1 2 3 4 5 6 7 # Hardcoded - works until it doesn't DATABASE_URL = "postgres://10.0.1.5:5432/mydb" # What happens when: # - Database moves to a new server? # - You add read replicas? # - The IP changes after maintenance? DNS-Based Discovery The simplest approach: use DNS names instead of IPs. ...

February 23, 2026 Â· 7 min Â· 1393 words Â· Rob Washington

Load Balancing: Distributing Traffic Without Playing Favorites

You’ve scaled horizontally — multiple servers ready to handle requests. Now you need something to decide which server handles each request. That’s load balancing, and the strategy you choose affects latency, reliability, and resource utilization. Round Robin: The Default Each server gets requests in rotation: Server 1, Server 2, Server 3, Server 1, Server 2… 1 2 3 4 5 upstream backend { server app1:8080; server app2:8080; server app3:8080; } Pros: Simple to understand and implement Even distribution over time No state to maintain Cons: ...

February 23, 2026 Â· 6 min Â· 1268 words Â· Rob Washington

Caching Strategies: The Two Hardest Problems in Computer Science

Phil Karlton’s famous quote about hard problems in computer science exists because caching is genuinely difficult. Not the mechanics — putting data in Redis is easy. The hard part is knowing when that data is wrong. Get caching right and your application feels instant. Get it wrong and users see stale data, inconsistent state, or worse — data that was never supposed to be visible to them. The Cache-Aside Pattern (Lazy Loading) The most common pattern: check cache first, fall back to database, populate cache on miss. ...

February 23, 2026 Â· 6 min Â· 1261 words Â· Rob Washington

Graceful Shutdown: Finishing What You Started

When Kubernetes scales down your deployment or you push a new release, your running containers receive SIGTERM. Then, after a grace period, SIGKILL. The difference between graceful and chaotic shutdown is what happens in those seconds between the two signals. A request half-processed, a database transaction uncommitted, a file partially written — these are the artifacts of ungraceful shutdown. They create inconsistent state, failed requests, and debugging nightmares. The Signal Sequence 1 2 3 4 5 6 7 . . . . . . . S G P P P P I I r r r r r f G a o o o o T c c c c c s E e e e e e t R s s s s i M p s s s s l e l s r s s s e e i h h h x r n o o o o i u t d u u u t n l l l s n t c d d d i o o w n u s f c i g p n t i l t : r t o n o h o d p i s S c o s e c I e w a h o G s n c c d K s c i o e I b e n n L e p - n 0 L g t f e i i l c ( n n i t n s g g i o h o ( n t n m d e s e e w w r f o c c a w r l y u o k e ) l r a t k n : l y 3 0 s i n K u b e r n e t e s ) Basic Signal Handling 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 import signal import sys shutdown_requested = False def handle_sigterm(signum, frame): global shutdown_requested print("SIGTERM received, initiating graceful shutdown...") shutdown_requested = True signal.signal(signal.SIGTERM, handle_sigterm) signal.signal(signal.SIGINT, handle_sigterm) # Ctrl+C # Main loop checks shutdown flag while not shutdown_requested: process_next_item() # Cleanup after loop exits cleanup() sys.exit(0) Web Servers: Stop Accepting, Finish Processing Most web frameworks have built-in graceful shutdown. The pattern: ...

February 23, 2026 Â· 6 min Â· 1112 words Â· Rob Washington