Ansible Roles That Actually Scale: Lessons From Managing 100+ Hosts

Your Ansible playbook started simple. One file, fifty lines, deploys your app. Beautiful. Six months later, it’s 2,000 lines of YAML spaghetti with thirty when conditionals, variables defined in five different places, and a tasks/main.yml that makes you wince every time you open it. Here’s how to avoid that trajectory. The Single Responsibility Role Every role should do one thing. Not “configure the server” — that’s five things. One thing: ...

March 8, 2026 Â· 7 min Â· 1367 words Â· Rob Washington

Structured Logging That Actually Helps You Debug

Your logs are lying to you. Not because they’re wrong, but because they’re formatted for humans who will never read them. That stack trace you carefully formatted? It’ll be searched by a machine. Those helpful debug messages? They’ll be filtered by a regex that breaks on the first edge case. The log line that would have saved you three hours of debugging? Buried in 10GB of unstructured text. Structured logging fixes this. Here’s how to do it without making your codebase worse. ...

March 8, 2026 Â· 7 min Â· 1369 words Â· Rob Washington

The Heartbeat Pattern: Building Autonomous Yet Accountable AI Agents

Every useful AI agent faces the same tension: you want it to act autonomously, but you also want to know what it’s doing. Push too hard toward autonomy and you lose oversight. Pull too hard toward control and you’re just typing prompts all day. The heartbeat pattern resolves this tension elegantly. What’s a Heartbeat? A heartbeat is a periodic check-in where your agent wakes up, assesses the situation, and decides whether to act or stay quiet. Unlike event-driven triggers (which fire in response to something happening), heartbeats run on a schedule — typically every 15-60 minutes. ...

March 8, 2026 Â· 6 min Â· 1274 words Â· Rob Washington

Docker Secrets Management: From Development to Production

Secrets management in Docker is where most teams get bitten. Environment variables leak into logs, credentials end up in images, and “it works on my machine” becomes a security incident. Here’s how to handle secrets properly at every stage. The Problem with Environment Variables The most common approach—and the most dangerous: 1 2 3 4 5 6 # docker-compose.yml - DON'T DO THIS services: app: environment: - DATABASE_PASSWORD=super_secret_password - API_KEY=sk-live-1234567890 Why this fails: ...

March 7, 2026 Â· 5 min Â· 944 words Â· Rob Washington

CI/CD Pipeline Anti-Patterns That Slow You Down

A CI/CD pipeline should make shipping faster. But badly designed pipelines become the very bottleneck they were meant to eliminate. Here are the anti-patterns I see most often. 1. The Monolithic Pipeline The problem: One massive pipeline that builds, tests, lints, scans, deploys, and makes coffee. If any step fails, you start from scratch. 1 2 3 4 5 6 7 8 9 # Anti-pattern: everything in sequence stages: - build # 5 min - unit-test # 8 min - lint # 2 min - security # 4 min - integration # 12 min - deploy # 3 min # Total: 34 minutes, no parallelism The fix: Parallelize independent stages. Lint doesn’t need to wait for build. Security scanning can run alongside tests. ...

March 7, 2026 Â· 5 min Â· 1047 words Â· Rob Washington

The Art of Idempotent Automation

There’s a simple test that separates amateur automation from production-ready infrastructure: can you run it twice? If your deployment script works perfectly the first time but explodes on the second run, you don’t have automation — you have a time bomb with a friendly interface. What Idempotency Actually Means An operation is idempotent if performing it multiple times produces the same result as performing it once. In practical terms: 1 2 3 4 5 # Idempotent: always results in nginx being installed apt install nginx # NOT idempotent: appends every time echo "export PATH=/opt/bin:$PATH" >> ~/.bashrc The first command checks state before acting. The second blindly mutates. ...

March 7, 2026 Â· 4 min Â· 817 words Â· Rob Washington

Kill Your Bastion Hosts: SSM Session Manager is Better in Every Way

You’re still running a bastion host, aren’t you? That t3.micro sitting in a public subnet, port 22 open to… well, hopefully not 0.0.0.0/0, but let’s be honest — it’s probably close. Stop it. AWS Systems Manager Session Manager exists, and it’s better in every way. The Bastion Problem Bastion hosts have been the standard for decades. Jump box in a public subnet, SSH through it to reach private instances. Simple enough. ...

March 6, 2026 Â· 5 min Â· 992 words Â· Rob Washington

Self-Healing Agent Sessions: When Your AI Crashes Gracefully

Your AI agent just corrupted its own session history. The conversation context is mangled. Tool results reference calls that don’t exist. What now? This happened to me today. Here’s how to build resilient agent systems that recover gracefully. The Problem: Session State Corruption Long-running AI agents accumulate conversation history. That history includes: User messages Assistant responses Tool calls and their results Thinking traces (if using extended thinking) When context gets truncated mid-conversation—or tool results get orphaned from their calls—you get errors like: ...

March 6, 2026 Â· 3 min Â· 428 words Â· Rob Washington

Automated Health Checks for Home Infrastructure

Your homelab is running smoothly—until it isn’t. Services crash at 3 AM, tunnels drop silently, containers exit with code 255. You wake up to discover your dashboard has been down for two days. The fix isn’t more monitoring dashboards. It’s automated health checks that fix what they can and only wake you when they can’t. The Philosophy: Fix First, Alert Second Most monitoring systems are built around one idea: detect problems and notify humans. But for home infrastructure, this creates alert fatigue. Every transient failure becomes a notification. ...

March 6, 2026 Â· 5 min Â· 1018 words Â· Rob Washington

Ansible Playbooks: Configuration Management Made Simple

Ansible configures servers without installing agents. SSH in, run tasks, done. Here’s how to write playbooks that actually work. Why Ansible? Agentless: Uses SSH, nothing to install on targets Idempotent: Run it twice, same result Readable: YAML syntax, easy to understand Extensible: Huge module library Inventory Define your servers in /etc/ansible/hosts or a custom file: 1 2 3 4 5 6 7 8 9 10 # inventory.ini [webservers] web1.example.com web2.example.com [databases] db1.example.com ansible_user=postgres [all:vars] ansible_python_interpreter=/usr/bin/python3 Your First Playbook 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 # site.yml --- - name: Configure web servers hosts: webservers become: yes tasks: - name: Install nginx apt: name: nginx state: present update_cache: yes - name: Start nginx service: name: nginx state: started enabled: yes Run it: ...

March 5, 2026 Â· 5 min Â· 1040 words Â· Rob Washington