Ansible playbooks can quickly become unwieldy spaghetti. Here are battle-tested patterns for writing infrastructure code that scales with your team and your infrastructure.

The Role Structure That Actually Works

Forget the minimal examples. Real roles need this structure:

rolesw/ebsedvthtfmreaaaeievfrsnmlteaskdpearum/msmicslmlnss/m/laa/anoeeaag/satiiisnrritilisnnntfvsnen-n/...aii/.sxp.yyylgcy/.aymmmluemcrmlll.r.loalyeynmm.mfslyl..mjcl2on#######f#DREPCSRDeonaoeeefltcnrspaerkfvteuyaiianlvggcrdtapeueteror/nviiiamrcaanntaeirbtsinleiltoaosae-angabsledljlfme(uaiehshstlnaitietn(gosdlhinloenewrcrelssputrdeepcsreedceendceen)ce)

The key insight: tasks/main.yml should only contain includes:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# tasks/main.yml
---
- name: Include installation tasks
  ansible.builtin.include_tasks: install.yml
  tags: [install]

- name: Include configuration tasks
  ansible.builtin.include_tasks: configure.yml
  tags: [configure]

- name: Include service tasks
  ansible.builtin.include_tasks: service.yml
  tags: [service]

This lets you run specific phases with --tags configure without touching installation.

Variable Precedence You Can Reason About

Ansible’s 22 levels of variable precedence are a trap. Use only these three:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# 1. defaults/main.yml - Sensible defaults, always overridable
webserver_port: 80
webserver_worker_processes: auto

# 2. group_vars/production.yml - Environment-specific
webserver_port: 443
webserver_ssl_enabled: true

# 3. host_vars/web01.yml - Host-specific exceptions
webserver_worker_processes: 8  # This host has more cores

Never use vars/main.yml for things users should override. Reserve it for internal role variables that shouldn’t change.

Idempotent Tasks With Changed-When

The worst Ansible code always reports “changed.” Fix it:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# Bad - always reports changed
- name: Run database migrations
  ansible.builtin.command: python manage.py migrate

# Good - only reports changed when something happened
- name: Run database migrations
  ansible.builtin.command: python manage.py migrate
  register: migrate_result
  changed_when: "'No migrations to apply' not in migrate_result.stdout"

# Better - skip entirely if no migrations pending
- name: Check for pending migrations
  ansible.builtin.command: python manage.py showmigrations --plan
  register: migration_plan
  changed_when: false

- name: Run database migrations
  ansible.builtin.command: python manage.py migrate
  when: "'[ ]' in migration_plan.stdout"

Block/Rescue for Error Handling

Don’t let a single failure leave your system in a broken state:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
- name: Deploy application with rollback
  block:
    - name: Stop application
      ansible.builtin.systemd:
        name: myapp
        state: stopped

    - name: Deploy new version
      ansible.builtin.copy:
        src: "{{ release_artifact }}"
        dest: /opt/myapp/current
      register: deploy_result

    - name: Run database migrations
      ansible.builtin.command: /opt/myapp/current/migrate.sh

    - name: Start application
      ansible.builtin.systemd:
        name: myapp
        state: started

  rescue:
    - name: Restore previous version
      ansible.builtin.copy:
        src: /opt/myapp/previous
        dest: /opt/myapp/current
      when: deploy_result is defined and deploy_result.changed

    - name: Start application (rollback)
      ansible.builtin.systemd:
        name: myapp
        state: started

    - name: Fail with message
      ansible.builtin.fail:
        msg: "Deployment failed, rolled back to previous version"

  always:
    - name: Clean up temp files
      ansible.builtin.file:
        path: /tmp/deploy-staging
        state: absent

Dynamic Inventory Patterns

Static inventory files don’t scale. Use dynamic inventory with caching:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#!/usr/bin/env python3
# inventory/aws_inventory.py
import boto3
import json
from functools import lru_cache

@lru_cache(maxsize=1)
def get_instances():
    ec2 = boto3.client('ec2')
    response = ec2.describe_instances(
        Filters=[{'Name': 'tag:Environment', 'Values': ['production']}]
    )
    
    inventory = {'_meta': {'hostvars': {}}}
    
    for reservation in response['Reservations']:
        for instance in reservation['Instances']:
            if instance['State']['Name'] != 'running':
                continue
                
            # Group by role tag
            role = next(
                (t['Value'] for t in instance.get('Tags', []) 
                 if t['Key'] == 'Role'), 
                'ungrouped'
            )
            
            if role not in inventory:
                inventory[role] = {'hosts': []}
            
            host = instance['PrivateIpAddress']
            inventory[role]['hosts'].append(host)
            inventory['_meta']['hostvars'][host] = {
                'instance_id': instance['InstanceId'],
                'instance_type': instance['InstanceType'],
            }
    
    return inventory

if __name__ == '__main__':
    print(json.dumps(get_instances(), indent=2))

Secrets With ansible-vault

Never commit plaintext secrets. Use vault with a pattern:

1
2
3
4
5
6
7
8
9
# group_vars/production/vars.yml (committed)
db_host: prod-db.internal
db_name: myapp
db_user: "{{ vault_db_user }}"
db_password: "{{ vault_db_password }}"

# group_vars/production/vault.yml (encrypted)
vault_db_user: produser
vault_db_password: supersecret123

Encrypt only the vault file:

1
ansible-vault encrypt group_vars/production/vault.yml

Use separate vault passwords per environment:

1
2
3
ansible-playbook site.yml \
  --vault-id production@~/.vault_passwords/production \
  --vault-id staging@~/.vault_passwords/staging

Testing With Molecule

Every role should have tests:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# molecule/default/molecule.yml
dependency:
  name: galaxy

driver:
  name: docker

platforms:
  - name: instance
    image: geerlingguy/docker-ubuntu2204-ansible
    pre_build_image: true
    privileged: true

provisioner:
  name: ansible

verifier:
  name: ansible
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# molecule/default/verify.yml
- name: Verify
  hosts: all
  tasks:
    - name: Check nginx is running
      ansible.builtin.service_facts:

    - name: Assert nginx is running
      ansible.builtin.assert:
        that:
          - "'nginx' in services"
          - "services['nginx'].state == 'running'"

Run with:

1
molecule test

Performance: Pipelining and Mitogen

Enable pipelining in ansible.cfg:

1
2
3
4
5
6
7
8
9
[connection]
pipelining = True

[defaults]
forks = 20
gathering = smart
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible_facts
fact_caching_timeout = 3600

For massive speedups, use Mitogen:

1
pip install mitogen
1
2
3
[defaults]
strategy_plugins = ~/.local/lib/python3.x/site-packages/ansible_mitogen/plugins/strategy
strategy = mitogen_linear

Mitogen can make playbooks 2-7x faster by eliminating SSH overhead.

Conclusion

Good Ansible code follows these principles:

  1. Roles are the unit of reuse — put everything in roles
  2. Tags for surgical runs — structure tasks for targeted execution
  3. Idempotence is mandatory — every task should be safe to run twice
  4. Secrets never in plaintext — ansible-vault, always
  5. Test your infrastructure — Molecule isn’t optional
  6. Performance matters — pipelining and caching from day one

Start with these patterns and your playbooks will scale from one server to thousands without becoming unmaintainable.