Ansible Idempotency Patterns: Write Playbooks That Don't Break Things

The promise of Ansible is simple: describe your desired state, run the playbook, and the system converges to that state. Run it again, nothing changes. That’s idempotency—and it’s harder to achieve than it sounds. Here’s how to write playbooks that won’t surprise you on the second run. The Problem: Commands That Lie The command and shell modules are where idempotency goes to die: 1 2 3 # ❌ BAD: Always reports "changed", even when nothing changed - name: Create database command: createdb myapp This fails on the second run because the database already exists. Worse, it always shows as “changed” even when it shouldn’t run at all. ...

February 21, 2026 Â· 5 min Â· 998 words Â· Rob Washington

Ansible Role Design Patterns: Building Reusable Automation

Ansible playbooks get messy fast. Roles fix this by packaging related tasks, variables, and handlers into reusable units. But bad roles are worse than no roles — they hide complexity without managing it. Here’s how to design roles that actually help. The Basic Structure r o l m e y s _ t h d v t f m / r a a e a e i e o s m n m f m r m m c l s t m l k a d a a a s a p o e c a a e s i l i u i / i l n s r / i / / n e n l n n a f / i n . r . t . . t i p . y s y s y y e g t y m / m / m m s . . m l l l l j s l 2 h # # # # # # # E E D R J S R n v e o i t o t e f l n a l r n a e j t e y t u a i - l v 2 c m p t t a e o r r t f t i i v i e i a n g a a m l d t g r b p e a e i l l s t r a e a a e b s t d l e a e ( s n t s h d a i s ( g d k l h e s o e p w r e e n s p d t r e e n p c c r e i e d e c e s e n d c e e n ) c e ) Only create directories you need. An empty handlers/ is noise. ...

February 19, 2026 Â· 8 min Â· 1576 words Â· Rob Washington

LLM Prompt Engineering for DevOps Automation

LLMs are becoming infrastructure. Not just chatbots — actual components in automation pipelines. But getting reliable, parseable output requires disciplined prompt engineering. Here’s what works for DevOps use cases. The Core Problem LLMs are probabilistic. Ask the same question twice, get different answers. That’s fine for chat. It’s terrible for automation that needs to parse structured output. The solution: constrain the output format and validate aggressively. Pattern 1: Structured Output with JSON Mode Most LLM APIs now support JSON mode. Use it. ...

February 19, 2026 Â· 6 min Â· 1138 words Â· Rob Washington

Secrets Rotation Automation: Stop Letting Credentials Rot

That database password hasn’t changed in three years. The API key in your config was committed by someone who left two jobs ago. The SSL certificate expires next Tuesday and nobody knows. Secrets rot. Rotation automation fixes this. Why Rotate? Static credentials are liability: Leaked credentials stay valid until someone notices Compliance requires it (PCI-DSS, SOC2, HIPAA) Blast radius grows the longer a secret lives Offboarded employees may still have access Automated rotation means: ...

February 19, 2026 Â· 8 min Â· 1636 words Â· Rob Washington

Ansible Playbook Patterns: Writing Maintainable Infrastructure Code

Ansible playbooks can quickly become unwieldy spaghetti. Here are battle-tested patterns for writing infrastructure code that scales with your team and your infrastructure. The Role Structure That Actually Works Forget the minimal examples. Real roles need this structure: r └ o ─ l ─ e s w ├ │ ├ │ ├ │ │ │ │ ├ │ ├ │ ├ │ └ / e ─ ─ ─ ─ ─ ─ ─ b ─ ─ ─ ─ ─ ─ ─ s e d └ v └ t ├ ├ ├ └ h └ t └ f └ m └ r e ─ a ─ a ─ ─ ─ ─ a ─ e ─ i ─ e ─ v f ─ r ─ s ─ ─ ─ ─ n ─ m ─ l ─ t ─ e a s k d p e a r u m / m s m i c s l m l n s s / m / l a a / a n o e e a a g / s a t i i i s n r r i t i l i s n n n t f v s n e n - n / . . . a i i / . s x p . y y y l g c y / . a y m m m l u e m c r m l l l . r . l o a l y e y n m m . m f s l y l . . m j c l 2 o n # # # # # # # f # D R E P C S R D e o n a o e e e f l t c n r s p a e r k f v t e u y a i i a n l v g g c r d t a p e u e t e r o r / n v i i i a m r c a a n n t a e i r b t s i n l e i l t o a o s a e - a n g a b s l e d l j l f m e ( u a i e h s h s t l n a i t i e t n ( g o s d l h i n l o e n e w r c r e l s s p u t r d e e p c s r e e d c e e n d c e e n ) c e ) The key insight: tasks/main.yml should only contain includes: ...

February 17, 2026 Â· 7 min Â· 1328 words Â· Rob Washington

Building Custom GitHub Actions for Infrastructure Automation

GitHub Actions has become the de facto CI/CD platform for many teams, but most only scratch the surface with pre-built actions from the marketplace. Building custom actions tailored to your infrastructure needs can dramatically reduce boilerplate and enforce consistency across repositories. Why Custom Actions? Every DevOps team has workflows that repeat across projects: Deploying to specific cloud environments Running security scans with custom policies Provisioning temporary environments for PR reviews Rotating secrets on a schedule Instead of copy-pasting YAML across repositories, custom actions encapsulate this logic once and reference it everywhere. ...

February 14, 2026 Â· 5 min Â· 984 words Â· Rob Washington

Documentation as Code: Keeping Docs in Sync with Your Systems

Documentation that lives outside your codebase gets stale. Documentation that isn’t tested breaks. Let’s treat docs like code—versioned, automated, and verified. Docs Live with Code Repository Structure p ├ ├ │ │ │ │ │ │ │ ├ └ r ─ ─ ─ ─ o ─ ─ ─ ─ j e s d ├ ├ ├ │ └ m R c r o ─ ─ ─ ─ k E t c c ─ ─ ─ ─ d A / / s o D / g a a └ r ├ └ c M e r p ─ u ─ ─ s E t c i ─ n ─ ─ . . t h / b y m i i o o d i m d n t p o e n l g e e k p c - c n s l i s t a / o d t u p y e a r i m n r e . e t t . y n - e m a t r d d m . e . l m s m d p d o n s e . m d MkDocs Configuration 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 # mkdocs.yml site_name: My Project site_url: https://docs.example.com repo_url: https://github.com/org/project theme: name: material features: - navigation.tabs - navigation.sections - search.highlight - content.code.copy plugins: - search - git-revision-date-localized - macros markdown_extensions: - admonition - codehilite - toc: permalink: true - pymdownx.superfences: custom_fences: - name: mermaid class: mermaid format: !!python/name:pymdownx.superfences.fence_code_format nav: - Home: index.md - Getting Started: getting-started.md - Architecture: architecture.md - API Reference: api/ - Runbooks: - Deployment: runbooks/deployment.md - Incidents: runbooks/incident-response.md Auto-Generated API Docs From OpenAPI Spec 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 # docs/api/openapi.yaml openapi: 3.0.3 info: title: My API version: 1.0.0 description: | API for managing resources. ## Authentication All endpoints require Bearer token authentication. paths: /users: get: summary: List users tags: [Users] parameters: - name: limit in: query schema: type: integer default: 20 responses: '200': description: List of users content: application/json: schema: type: array items: $ref: '#/components/schemas/User' components: schemas: User: type: object properties: id: type: string format: uuid email: type: string format: email created_at: type: string format: date-time Generate from Code 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 # generate_docs.py from fastapi import FastAPI from fastapi.openapi.utils import get_openapi import json import yaml app = FastAPI() # ... your routes ... def export_openapi(): """Export OpenAPI spec to file.""" openapi_schema = get_openapi( title=app.title, version=app.version, description=app.description, routes=app.routes, ) # Write JSON with open('docs/api/openapi.json', 'w') as f: json.dump(openapi_schema, f, indent=2) # Write YAML with open('docs/api/openapi.yaml', 'w') as f: yaml.dump(openapi_schema, f, default_flow_style=False) if __name__ == "__main__": export_openapi() Terraform Docs Generation 1 2 3 4 5 # Install terraform-docs brew install terraform-docs # Generate markdown from module terraform-docs markdown table ./modules/vpc > ./modules/vpc/README.md 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 # .terraform-docs.yml formatter: markdown table sections: show: - requirements - providers - inputs - outputs - resources output: file: README.md mode: inject template: |- <!-- BEGIN_TF_DOCS --> {{ .Content }} <!-- END_TF_DOCS --> Diagrams as Code Mermaid in Markdown 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 # Architecture ```mermaid graph TB subgraph "Public" LB[Load Balancer] end subgraph "Application Tier" API1[API Server 1] API2[API Server 2] end subgraph "Data Tier" DB[(PostgreSQL)] Cache[(Redis)] end LB --> API1 LB --> API2 API1 --> DB API2 --> DB API1 --> Cache API2 --> Cache # ` # f f f f w # ` r r r r i # ` d o o o o t p i m m m m h P y a d l w w d a a y t g d d d d D n b i i n p p t h r i i i i i s t t s i i h o a a a a a a = h h o n m g g g g g = a d c n s r r r r r A C p C b a / a a a a a R L l i l c D a m m m m m o B u u = h l d c i r s s s s ( u ( s = s e b b a a c . . . " t " t t R c g h i a a a P e L e [ e D = h r i m w w w r 5 o r E r S e a t p s s s o 3 a ( C ( ( E m e o . . . d ( d " S " " l a s c r c d n u " A ( D P a p t t o a e c D B p " a o s i u m t t t N a p A t s t r D p a w i S l l P a t i e i u b o o " a i I " g C . a t a r n ) n c ) r a p g e s k c a 1 : e c y r e A e t " S h a i i r r i ) Q e m m i m c " o , L ( , p m p h ) n " " o p o i " E ) R C r o r t ) C e l t r t e : S d u t c ( i s E A t " s t C R L u A " e S D B r P ) r S , e I , " R , 2 E o " l u s ) a t h , s e o t 5 w E i 3 = C C F S a a ( c l " h s A e e P , I f 3 i " l ) e ] n a m e = " d o c s / i m a g e s / a r c h i t e c t u r e " ) : CI/CD for Documentation GitHub Actions 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 # .github/workflows/docs.yml name: Documentation on: push: branches: [main] paths: - 'docs/**' - 'mkdocs.yml' - 'src/**/*.py' # Regenerate if code changes pull_request: paths: - 'docs/**' jobs: build: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 with: fetch-depth: 0 # For git-revision-date plugin - uses: actions/setup-python@v5 with: python-version: '3.11' - name: Install dependencies run: | pip install mkdocs-material mkdocs-macros-plugin pip install -e . # Install project for API doc generation - name: Generate API docs run: python generate_docs.py - name: Generate Terraform docs run: | terraform-docs markdown table ./terraform > ./docs/terraform.md - name: Build documentation run: mkdocs build --strict - name: Upload artifact uses: actions/upload-pages-artifact@v3 with: path: site/ deploy: if: github.ref == 'refs/heads/main' needs: build runs-on: ubuntu-latest permissions: pages: write id-token: write environment: name: github-pages steps: - uses: actions/deploy-pages@v4 Documentation Testing Link Checking 1 2 3 4 5 6 # In CI - name: Check links uses: lycheeverse/lychee-action@v1 with: args: --verbose --no-progress './docs/**/*.md' fail: true Code Example Testing 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 # test_docs.py import subprocess import re from pathlib import Path def extract_code_blocks(markdown_file: Path) -> list: """Extract fenced code blocks from markdown.""" content = markdown_file.read_text() pattern = r'```(\w+)\n(.*?)```' return re.findall(pattern, content, re.DOTALL) def test_python_examples(): """Test that Python code examples are valid.""" for doc in Path('docs').rglob('*.md'): for lang, code in extract_code_blocks(doc): if lang == 'python': # Check syntax try: compile(code, doc.name, 'exec') except SyntaxError as e: raise AssertionError(f"Syntax error in {doc}: {e}") def test_bash_examples(): """Test that bash commands are valid syntax.""" for doc in Path('docs').rglob('*.md'): for lang, code in extract_code_blocks(doc): if lang in ('bash', 'shell', 'sh'): result = subprocess.run( ['bash', '-n', '-c', code], capture_output=True ) if result.returncode != 0: raise AssertionError( f"Bash syntax error in {doc}: {result.stderr.decode()}" ) OpenAPI Validation 1 2 3 4 # In CI - name: Validate OpenAPI spec run: | npx @apidevtools/swagger-cli validate docs/api/openapi.yaml README Generation From Template 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 # generate_readme.py from jinja2 import Template import subprocess def get_version(): return subprocess.check_output( ['git', 'describe', '--tags', '--always'] ).decode().strip() def get_contributors(): output = subprocess.check_output( ['git', 'shortlog', '-sn', 'HEAD'] ).decode() return [line.split('\t')[1] for line in output.strip().split('\n')] template = Template(''' # {{ name }} {{ description }} ## Installation ```bash pip install {{ package_name }} Quick Start 1 2 3 4 from {{ package_name }} import Client client = Client(api_key="your-key") result = client.do_something() Documentation Full documentation at [{{ docs_url }}]({{ docs_url }}) ...

February 12, 2026 Â· 13 min Â· 2615 words Â· Rob Washington

DNS Management: Infrastructure as Code for Your Domain Records

DNS is the foundation everything else depends on. A misconfigured record can take down your entire infrastructure. Yet DNS is often managed through web consoles with no version control, no review process, and no automation. Let’s fix that. Terraform for DNS Route53 Basics 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 # dns.tf resource "aws_route53_zone" "main" { name = "example.com" tags = { Environment = "production" } } # A record resource "aws_route53_record" "www" { zone_id = aws_route53_zone.main.zone_id name = "www.example.com" type = "A" ttl = 300 records = ["203.0.113.10"] } # CNAME record resource "aws_route53_record" "app" { zone_id = aws_route53_zone.main.zone_id name = "app.example.com" type = "CNAME" ttl = 300 records = ["app-lb-123456.us-east-1.elb.amazonaws.com"] } # Alias to ALB (no TTL, resolved at edge) resource "aws_route53_record" "api" { zone_id = aws_route53_zone.main.zone_id name = "api.example.com" type = "A" alias { name = aws_lb.api.dns_name zone_id = aws_lb.api.zone_id evaluate_target_health = true } } # MX records resource "aws_route53_record" "mx" { zone_id = aws_route53_zone.main.zone_id name = "example.com" type = "MX" ttl = 3600 records = [ "10 mail1.example.com", "20 mail2.example.com" ] } # TXT for SPF resource "aws_route53_record" "spf" { zone_id = aws_route53_zone.main.zone_id name = "example.com" type = "TXT" ttl = 3600 records = ["v=spf1 include:_spf.google.com ~all"] } Dynamic Records from Infrastructure 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 # Generate records from other resources locals { services = { "api" = aws_lb.api.dns_name "admin" = aws_lb.admin.dns_name "docs" = aws_cloudfront_distribution.docs.domain_name } } resource "aws_route53_record" "services" { for_each = local.services zone_id = aws_route53_zone.main.zone_id name = "${each.key}.example.com" type = "CNAME" ttl = 300 records = [each.value] } # From Kubernetes ingresses data "kubernetes_ingress_v1" "all" { for_each = toset(["api", "web", "admin"]) metadata { name = each.key namespace = "production" } } resource "aws_route53_record" "k8s_services" { for_each = data.kubernetes_ingress_v1.all zone_id = aws_route53_zone.main.zone_id name = "${each.key}.example.com" type = "CNAME" ttl = 300 records = [each.value.status[0].load_balancer[0].ingress[0].hostname] } DNS Failover Health Check Based Routing 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 # Health check for primary resource "aws_route53_health_check" "primary" { fqdn = "primary-api.example.com" port = 443 type = "HTTPS" resource_path = "/health" failure_threshold = 3 request_interval = 30 tags = { Name = "primary-api-health" } } # Health check for secondary resource "aws_route53_health_check" "secondary" { fqdn = "secondary-api.example.com" port = 443 type = "HTTPS" resource_path = "/health" failure_threshold = 3 request_interval = 30 tags = { Name = "secondary-api-health" } } # Primary record with failover resource "aws_route53_record" "api_primary" { zone_id = aws_route53_zone.main.zone_id name = "api.example.com" type = "A" ttl = 60 records = ["203.0.113.10"] set_identifier = "primary" health_check_id = aws_route53_health_check.primary.id failover_routing_policy { type = "PRIMARY" } } # Secondary record (used when primary fails) resource "aws_route53_record" "api_secondary" { zone_id = aws_route53_zone.main.zone_id name = "api.example.com" type = "A" ttl = 60 records = ["203.0.113.20"] set_identifier = "secondary" health_check_id = aws_route53_health_check.secondary.id failover_routing_policy { type = "SECONDARY" } } Weighted Routing for Gradual Migration 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 # 90% to current, 10% to new resource "aws_route53_record" "api_current" { zone_id = aws_route53_zone.main.zone_id name = "api.example.com" type = "A" ttl = 60 records = ["203.0.113.10"] set_identifier = "current" weighted_routing_policy { weight = 90 } } resource "aws_route53_record" "api_new" { zone_id = aws_route53_zone.main.zone_id name = "api.example.com" type = "A" ttl = 60 records = ["203.0.113.20"] set_identifier = "new" weighted_routing_policy { weight = 10 } } External DNS for Kubernetes Automatically create DNS records from Kubernetes resources. ...

February 12, 2026 Â· 9 min Â· 1841 words Â· Rob Washington

SSL/TLS Automation: Never Manually Renew a Certificate Again

Manual certificate management is a reliability incident waiting to happen. A forgotten renewal, an expired cert at 3 AM, angry customers. Let’s automate this problem away. Certbot: The Foundation Basic Setup 1 2 3 4 5 6 7 8 9 # Install certbot sudo apt install certbot python3-certbot-nginx # Get certificate for nginx sudo certbot --nginx -d example.com -d www.example.com # Auto-renewal is configured automatically # Test it: sudo certbot renew --dry-run Standalone Mode (No Web Server) 1 2 3 4 5 # Stop web server, get cert, restart sudo certbot certonly --standalone -d example.com # Or use DNS challenge (no downtime) sudo certbot certonly --manual --preferred-challenges dns -d example.com Automated Renewal with Hooks 1 2 3 4 5 6 7 8 # /etc/letsencrypt/renewal-hooks/deploy/reload-nginx.sh #!/bin/bash systemctl reload nginx # /etc/letsencrypt/renewal-hooks/post/notify.sh #!/bin/bash curl -X POST https://slack.com/webhook \ -d '{"text":"SSL certificate renewed for '$RENEWED_DOMAINS'"}' cert-manager for Kubernetes The standard for Kubernetes certificate automation. ...

February 12, 2026 Â· 7 min Â· 1454 words Â· Rob Washington

Home Automation for Developers: Beyond Smart Plugs

A developer’s guide to home automation — from simple scripts to full infrastructure, with patterns that actually work.

February 10, 2026 Â· 8 min Â· 1515 words Â· Rob Washington