Ansible is deceptively simple. Write some YAML, run it, things happen. Then your playbooks grow, your team grows, and suddenly everything is a mess. Here’s how to write Ansible that scales.

Project Structure

ansibaiprclnnlooesvall/ieyelbnpsbswdscnperltrtoiea/ogoceeooaotbtmistq.rdgkesamntiuciuhgihgs.eboxgoifecornoryranrnrgstsogsmvsese/itu/tulees/mospawdsprsqen._lea._s.lnyvlbtyv.y/tma.samaymslryeblrml.smrasly/lvs/meelrss..yymmll

Separate inventories per environment. Group vars by function. Roles for reusable logic.

Inventory Best Practices

Use YAML Inventory

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# inventories/production/hosts.yml
all:
  children:
    webservers:
      hosts:
        web01.example.com:
        web02.example.com:
      vars:
        nginx_worker_processes: auto
    
    databases:
      hosts:
        db01.example.com:
          postgresql_role: primary
        db02.example.com:
          postgresql_role: replica
      vars:
        postgresql_version: 15
    
    monitoring:
      hosts:
        prometheus.example.com:

Dynamic Inventory for Cloud

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# aws_ec2.yml
plugin: amazon.aws.aws_ec2
regions:
  - us-east-1
filters:
  tag:Environment: production
keyed_groups:
  - key: tags.Role
    prefix: role
  - key: placement.availability_zone
    prefix: az
hostnames:
  - private-ip-address
compose:
  ansible_host: private_ip_address

Now ansible-inventory -i aws_ec2.yml --list shows live AWS instances.

Role Structure

rolesdvthtfmm/eaaaeieonfrsnmltlgaskdpeaeium/msmlmlnss/mcdnlaa/aeaag/sauextiiiritililf/snnnsnen-nea/.../.sxp./uyyyy/.aylmmmmcrmtlllloal/nmfs..jc2onf########DRTHJSRTeoaaitoeflsnnalsaekdjtetulaiilvee2cdntanregrtstfpvireieaay(mlnrbrpedilpelseaeosanbsittclnaeie(trseshtsi,(glhroewplropearcdee)cdeednecnec)e)

Defaults vs Vars

1
2
3
4
5
6
7
8
# defaults/main.yml - Users can override these
nginx_worker_processes: auto
nginx_worker_connections: 1024
nginx_keepalive_timeout: 65

# vars/main.yml - Internal to the role, not meant to be overridden
nginx_conf_path: /etc/nginx/nginx.conf
nginx_service_name: nginx

Task Patterns

Idempotency Always

1
2
3
4
5
6
7
8
9
# Good: Idempotent
- name: Ensure nginx is installed
  ansible.builtin.package:
    name: nginx
    state: present

# Bad: Not idempotent
- name: Install nginx
  ansible.builtin.shell: apt-get install -y nginx

Shell commands are a last resort. Most have idempotent module equivalents.

Use Block for Error Handling

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
- name: Deploy application
  block:
    - name: Download artifact
      ansible.builtin.get_url:
        url: "{{ artifact_url }}"
        dest: /tmp/app.tar.gz

    - name: Extract artifact
      ansible.builtin.unarchive:
        src: /tmp/app.tar.gz
        dest: /opt/app
        remote_src: yes

    - name: Restart service
      ansible.builtin.systemd:
        name: myapp
        state: restarted

  rescue:
    - name: Rollback to previous version
      ansible.builtin.copy:
        src: /opt/app.backup/
        dest: /opt/app/
        remote_src: yes

    - name: Alert on failure
      ansible.builtin.uri:
        url: "{{ slack_webhook }}"
        method: POST
        body_format: json
        body:
          text: "Deployment failed on {{ inventory_hostname }}"

  always:
    - name: Clean up temp files
      ansible.builtin.file:
        path: /tmp/app.tar.gz
        state: absent

Conditionals Done Right

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# Use 'when' with facts
- name: Install EPEL (RedHat family)
  ansible.builtin.yum:
    name: epel-release
    state: present
  when: ansible_os_family == "RedHat"

- name: Install packages (Debian family)
  ansible.builtin.apt:
    name: "{{ packages }}"
    state: present
    update_cache: yes
  when: ansible_os_family == "Debian"

# Better: Use package manager abstraction
- name: Install packages
  ansible.builtin.package:
    name: "{{ common_packages }}"
    state: present

Loops Efficiently

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# Good: Single task with loop
- name: Create users
  ansible.builtin.user:
    name: "{{ item.name }}"
    groups: "{{ item.groups }}"
    state: present
  loop: "{{ users }}"

# Better for large lists: Use loop_control
- name: Create many users
  ansible.builtin.user:
    name: "{{ item.name }}"
    groups: "{{ item.groups }}"
  loop: "{{ users }}"
  loop_control:
    label: "{{ item.name }}"  # Cleaner output
    pause: 1                   # Rate limiting if needed

Variable Precedence

From lowest to highest (simplified):

  1. Role defaults (roles/x/defaults/main.yml)
  2. Inventory group_vars (inventories/prod/group_vars/all.yml)
  3. Inventory host_vars
  4. Playbook group_vars
  5. Playbook host_vars
  6. Role vars (roles/x/vars/main.yml)
  7. Task vars (set_fact, vars:)
  8. Extra vars (-e)

Rule of thumb: Put user-configurable values in defaults, put internal values in vars.

Secrets Management

Ansible Vault

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Create encrypted file
ansible-vault create group_vars/all/vault.yml

# Edit encrypted file
ansible-vault edit group_vars/all/vault.yml

# Run playbook with vault
ansible-playbook site.yml --ask-vault-pass
# Or with password file
ansible-playbook site.yml --vault-password-file ~/.vault_pass
1
2
3
4
5
6
7
# group_vars/all/vault.yml (encrypted)
vault_db_password: supersecret
vault_api_key: sk-12345

# group_vars/all/main.yml (references vault)
db_password: "{{ vault_db_password }}"
api_key: "{{ vault_api_key }}"

External Secrets (HashiCorp Vault)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
- name: Get secret from Vault
  community.hashi_vault.vault_kv2_get:
    url: "{{ vault_url }}"
    path: secret/data/myapp
    auth_method: token
    token: "{{ vault_token }}"
  register: secret_data
  delegate_to: localhost
  no_log: true

- name: Use secret
  ansible.builtin.template:
    src: config.j2
    dest: /etc/myapp/config.yml
  vars:
    db_password: "{{ secret_data.secret.db_password }}"

Handler Patterns

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# handlers/main.yml
- name: Restart nginx
  ansible.builtin.systemd:
    name: nginx
    state: restarted
  listen: "restart web server"

- name: Reload nginx
  ansible.builtin.systemd:
    name: nginx
    state: reloaded
  listen: "reload web server"

# tasks/main.yml
- name: Update nginx config
  ansible.builtin.template:
    src: nginx.conf.j2
    dest: /etc/nginx/nginx.conf
  notify: reload web server

- name: Update SSL certificates
  ansible.builtin.copy:
    src: "{{ item }}"
    dest: /etc/nginx/ssl/
  loop:
    - cert.pem
    - key.pem
  notify: restart web server

Use listen for multiple handlers responding to one event.

Performance Optimization

Pipelining

1
2
3
# ansible.cfg
[ssh_connection]
pipelining = True

Reduces SSH operations. Requires requiretty disabled in sudoers.

Async for Long Tasks

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
- name: Run long database migration
  ansible.builtin.shell: /opt/app/migrate.sh
  async: 3600  # Max runtime: 1 hour
  poll: 0      # Don't wait
  register: migration_job

- name: Wait for migration to complete
  ansible.builtin.async_status:
    jid: "{{ migration_job.ansible_job_id }}"
  register: job_result
  until: job_result.finished
  retries: 60
  delay: 60

Fact Caching

1
2
3
4
5
# ansible.cfg
[defaults]
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible_facts
fact_caching_timeout = 86400

Skip fact gathering on subsequent runs.

Free Strategy

1
2
3
4
5
6
7
# playbook.yml
- hosts: webservers
  strategy: free  # Don't wait for all hosts per task
  tasks:
    - name: Update packages
      ansible.builtin.apt:
        upgrade: dist

Hosts proceed independently. Faster but harder to debug.

Testing with Molecule

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# molecule/default/molecule.yml
dependency:
  name: galaxy
driver:
  name: docker
platforms:
  - name: ubuntu
    image: ubuntu:22.04
    pre_build_image: true
  - name: rocky
    image: rockylinux:9
    pre_build_image: true
provisioner:
  name: ansible
verifier:
  name: ansible
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# molecule/default/verify.yml
- name: Verify
  hosts: all
  tasks:
    - name: Check nginx is running
      ansible.builtin.service_facts:
      
    - name: Assert nginx is running
      ansible.builtin.assert:
        that:
          - ansible_facts.services['nginx.service'].state == 'running'
1
molecule test

The Checklist

  • Inventory per environment
  • Roles for reusable logic
  • Defaults for user config, vars for internal
  • Vault for secrets
  • Handlers for service restarts
  • Molecule tests for roles
  • Pipelining enabled
  • No shell tasks without good reason

Start Here

  1. Today: Enable pipelining in ansible.cfg
  2. This week: Convert a playbook to a role
  3. This month: Add Molecule tests to one role
  4. This quarter: Implement dynamic inventory

Good Ansible is boring Ansible. Predictable, testable, and maintainable.


The best automation is the one you can hand to a teammate and they understand it immediately.