Ansible is easy to start and hard to master. A simple playbook works great for 5 servers. The same playbook becomes unmaintainable at 50. Here are the patterns that keep Ansible codebases sane as they grow.

Project Structure

Start with a structure that scales:

ansibaiprglnnloresvalo/ieyeubnpsbswdscnppaltrtoiea/ogo_leooaotbtmisvl.rdgkesamnta/cyuhgihgs.eboxgrvvf/cornoryranrsaagtsogsomvse/ruitu/tuleesslospawspars/.tn._le._ls.y.yvlbyvl.ymyma.sma.ymlmlryelrymllsmrsml/lv/lers.yml

The key insight: separate inventory per environment. Never mix production and staging in the same inventory file.

Inventory Patterns

Dynamic Groups

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# inventory/production/hosts.yml
all:
  children:
    webservers:
      hosts:
        web1.example.com:
        web2.example.com:
    databases:
      hosts:
        db1.example.com:
          postgres_role: primary
        db2.example.com:
          postgres_role: replica
    
    # Composed groups
    atlanta:
      children:
        atlanta_web:
        atlanta_db:
    
    # Pattern-based
    ubuntu:
      hosts:
        web1.example.com:
        db1.example.com:

Host Variables

1
2
3
4
# inventory/production/host_vars/db1.example.com.yml
postgres_max_connections: 200
postgres_shared_buffers: 4GB
backup_schedule: "0 2 * * *"

Role Design

Minimal Role Structure

rolesdthtm/eaaeenfsnmtgakdpaiumsmlmln/mnla/aeaagaxtiiritii/snnsnenn/../.sx.yyy/.ymmmcmlllolnf.j2#####DTSJReaeiofsrnlakvjeuialec2dtneettpvrheeayamnrnpdipdleaolanbietclnreietssess(lowestprecedence)

Defaults vs Vars

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# roles/nginx/defaults/main.yml
# Users CAN override these
nginx_worker_processes: auto
nginx_worker_connections: 1024
nginx_client_max_body_size: 10m

# roles/nginx/vars/main.yml  
# Users SHOULD NOT override these (internal implementation)
nginx_config_path: /etc/nginx
nginx_service_name: nginx

Task Organization

Split large task files:

1
2
3
4
5
6
7
8
9
# roles/nginx/tasks/main.yml
- name: Include OS-specific variables
  include_vars: "{{ ansible_os_family }}.yml"

- import_tasks: install.yml
- import_tasks: configure.yml
- import_tasks: ssl.yml
  when: nginx_ssl_enabled | default(false)
- import_tasks: vhosts.yml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# roles/nginx/tasks/install.yml
- name: Install nginx
  apt:
    name: nginx
    state: present
  when: ansible_os_family == "Debian"

- name: Install nginx (RedHat)
  yum:
    name: nginx
    state: present
  when: ansible_os_family == "RedHat"

Handler Patterns

Debounced Restarts

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# roles/nginx/handlers/main.yml
- name: Reload nginx
  service:
    name: nginx
    state: reloaded
  listen: "reload nginx"

- name: Restart nginx
  service:
    name: nginx
    state: restarted
  listen: "restart nginx"

# In tasks - use listen name
- name: Update nginx config
  template:
    src: nginx.conf.j2
    dest: /etc/nginx/nginx.conf
  notify: reload nginx

- name: Update SSL certificates
  copy:
    src: "{{ item }}"
    dest: /etc/nginx/ssl/
  loop: "{{ ssl_certificates }}"
  notify: reload nginx

Multiple tasks can notify the same handler; it only runs once at the end.

Flush Handlers When Needed

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
- name: Update config
  template:
    src: app.conf.j2
    dest: /etc/app/config
  notify: restart app

# Force handler to run NOW (before continuing)
- meta: flush_handlers

- name: Run migrations (needs app running with new config)
  command: /opt/app/migrate

Variable Precedence

From lowest to highest (simplified):

  1. Role defaults (roles/x/defaults/main.yml)
  2. Inventory group_vars (inventory/prod/group_vars/all.yml)
  3. Inventory host_vars
  4. Playbook group_vars
  5. Playbook host_vars
  6. Play vars
  7. Task vars
  8. Extra vars (-e "var=value")

Rule of thumb: Put defaults in roles, overrides in inventory.

Secret Management

Ansible Vault

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Create encrypted file
ansible-vault create group_vars/all/vault.yml

# Edit encrypted file
ansible-vault edit group_vars/all/vault.yml

# Encrypt existing file
ansible-vault encrypt secrets.yml

# Run playbook with vault
ansible-playbook site.yml --ask-vault-pass
# Or with password file
ansible-playbook site.yml --vault-password-file ~/.vault_pass

Vault Variable Pattern

1
2
3
4
5
6
7
# group_vars/all/vault.yml (encrypted)
vault_db_password: "supersecret123"
vault_api_key: "sk_live_abc123"

# group_vars/all/vars.yml (plain text)
db_password: "{{ vault_db_password }}"
api_key: "{{ vault_api_key }}"

This lets you grep for variable usage without decrypting vault files.

Idempotency Patterns

Check Before Change

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
- name: Check if app is installed
  stat:
    path: /opt/app/bin/app
  register: app_binary

- name: Download app
  get_url:
    url: "https://releases.example.com/app-{{ app_version }}.tar.gz"
    dest: /tmp/app.tar.gz
  when: not app_binary.stat.exists or force_reinstall | default(false)

Changed When

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
- name: Run database migrations
  command: /opt/app/migrate
  register: migrate_result
  changed_when: "'Applied' in migrate_result.stdout"

- name: Check service status
  command: systemctl is-active myapp
  register: service_status
  changed_when: false  # Never report as changed
  failed_when: false   # Don't fail if inactive

Performance Patterns

Pipelining

1
2
3
# ansible.cfg
[ssh_connection]
pipelining = True

Reduces SSH operations significantly.

Async for Long Tasks

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
- name: Run long backup
  command: /usr/local/bin/backup.sh
  async: 3600        # Max runtime: 1 hour
  poll: 0            # Don't wait
  register: backup_job

- name: Continue with other tasks
  debug:
    msg: "Doing other work while backup runs"

- name: Wait for backup to complete
  async_status:
    jid: "{{ backup_job.ansible_job_id }}"
  register: job_result
  until: job_result.finished
  retries: 60
  delay: 60

Limit Execution

1
2
3
4
5
6
7
8
# Only run on specific hosts
ansible-playbook site.yml --limit web1.example.com

# Only run specific tags
ansible-playbook site.yml --tags "nginx,ssl"

# Skip specific tags
ansible-playbook site.yml --skip-tags "slow"
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Tag tasks for selective runs
- name: Install packages
  apt:
    name: "{{ packages }}"
  tags: [packages, slow]

- name: Update config
  template:
    src: app.conf.j2
    dest: /etc/app/config
  tags: [config]

Testing Patterns

Check Mode

1
2
# Dry run - show what would change
ansible-playbook site.yml --check --diff

Molecule for Role Testing

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# molecule/default/molecule.yml
dependency:
  name: galaxy
driver:
  name: docker
platforms:
  - name: ubuntu
    image: ubuntu:22.04
  - name: rocky
    image: rockylinux:9
verifier:
  name: ansible
1
2
3
4
5
# molecule/default/converge.yml
- name: Converge
  hosts: all
  roles:
    - role: nginx
1
molecule test

Error Handling

Block/Rescue

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
- name: Deploy application
  block:
    - name: Stop service
      service:
        name: myapp
        state: stopped

    - name: Deploy new version
      copy:
        src: app.tar.gz
        dest: /opt/app/

    - name: Start service
      service:
        name: myapp
        state: started

  rescue:
    - name: Rollback on failure
      copy:
        src: /opt/app/backup/
        dest: /opt/app/
        remote_src: yes

    - name: Start old version
      service:
        name: myapp
        state: started

  always:
    - name: Clean up temp files
      file:
        path: /tmp/deploy
        state: absent

Quick Reference

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Syntax check
ansible-playbook site.yml --syntax-check

# List hosts that would be affected  
ansible-playbook site.yml --list-hosts

# List tasks that would run
ansible-playbook site.yml --list-tasks

# Step through tasks one at a time
ansible-playbook site.yml --step

# Start at specific task
ansible-playbook site.yml --start-at-task="Deploy application"

Ansible rewards good structure. The patterns above—clear inventory separation, well-designed roles, proper secret management, and idempotent tasks—make the difference between “works on my machine” and “works reliably in production.” Start simple, refactor as you grow, and always test with --check first.