Your Ansible playbook started simple. One file, fifty lines, deploys your app. Beautiful.

Six months later, it’s 2,000 lines of YAML spaghetti with thirty when conditionals, variables defined in five different places, and a tasks/main.yml that makes you wince every time you open it.

Here’s how to avoid that trajectory.

The Single Responsibility Role

Every role should do one thing. Not “configure the server” — that’s five things. One thing:

rolesbusdna/assoopsehcdper-ke--she-dp/areear/xpcdplkeooanrygit/enesgr///######IMSIPDnaSnresnHsoptatmlagcaeoleoltylnlhufeycsiauooegnsumrudrmrnoaacoanctodpcineppooflaunieicngxcktupaasrotgerietosDenorcker

Each role is independently testable. Each role has clear inputs and outputs. When something breaks, you know exactly where to look.

The anti-pattern:

roless/ervetra-sskest/umpa/in.yml##D8o0e0slEiVnEeRsY,THnIoNbGodyunderstandsit

Variables: The Three-Layer Rule

Variables should live in exactly three places:

1. Role Defaults (defaults/main.yml)

Safe defaults that work for most cases:

1
2
3
4
5
# roles/docker/defaults/main.yml
docker_version: "24.0"
docker_storage_driver: "overlay2"
docker_log_max_size: "100m"
docker_log_max_file: 3

2. Group Variables (group_vars/)

Environment-specific overrides:

1
2
3
4
5
6
# group_vars/production.yml
docker_log_max_size: "500m"
docker_log_max_file: 10

# group_vars/development.yml
docker_version: "latest"

3. Host Variables (host_vars/)

Host-specific exceptions (use sparingly):

1
2
# host_vars/legacy-server-01.yml
docker_storage_driver: "devicemapper"  # Old kernel, can't use overlay2

Never define variables in:

  • vars/main.yml (hard to override)
  • Playbook files (scattered and forgotten)
  • Task files (just no)

Handlers: Idempotent and Specific

Handlers should be narrowly scoped and idempotent:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# roles/nginx/handlers/main.yml
- name: Reload nginx
  ansible.builtin.systemd:
    name: nginx
    state: reloaded
  listen: "reload nginx"

- name: Restart nginx
  ansible.builtin.systemd:
    name: nginx
    state: restarted
  listen: "restart nginx"

- name: Validate nginx config
  ansible.builtin.command: nginx -t
  changed_when: false
  listen: "validate nginx"

Use listen for handler names — it allows multiple tasks to trigger the same handler with the same semantic name:

1
2
3
4
5
6
7
- name: Update nginx.conf
  ansible.builtin.template:
    src: nginx.conf.j2
    dest: /etc/nginx/nginx.conf
  notify: 
    - validate nginx
    - reload nginx

Tags: Your Escape Hatch

Every logical group of tasks should have tags:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# roles/users/tasks/main.yml
- name: Create user accounts
  ansible.builtin.user:
    name: "{{ item.name }}"
    groups: "{{ item.groups }}"
  loop: "{{ users }}"
  tags:
    - users
    - users:create

- name: Deploy SSH keys
  ansible.posix.authorized_key:
    user: "{{ item.name }}"
    key: "{{ item.ssh_key }}"
  loop: "{{ users }}"
  tags:
    - users
    - users:ssh-keys

Now you can run:

1
2
3
4
5
# Full user setup
ansible-playbook site.yml --tags users

# Just update SSH keys (fast, targeted)
ansible-playbook site.yml --tags users:ssh-keys

The naming convention role:subtask keeps tags organized as your playbook grows.

Templates: Logic Belongs Elsewhere

Jinja2 templates should be mostly static. Complex logic in templates is a code smell:

1
2
3
4
5
6
7
8
{# BAD: Logic explosion in template #}
{% if environment == 'production' and region == 'us-east-1' and enable_ssl %}
  {% if ssl_cert_type == 'letsencrypt' %}
    ssl_certificate /etc/letsencrypt/live/{{ domain }}/fullchain.pem;
  {% elif ssl_cert_type == 'custom' %}
    ssl_certificate {{ custom_cert_path }};
  {% endif %}
{% endif %}

Instead, compute values in tasks or defaults:

1
2
3
4
5
# defaults/main.yml
ssl_certificate_path: "/etc/ssl/certs/{{ domain }}.crt"

# Override in group_vars/production.yml if needed
ssl_certificate_path: "/etc/letsencrypt/live/{{ domain }}/fullchain.pem"
1
2
3
4
{# GOOD: Template is simple #}
{% if ssl_enabled %}
ssl_certificate {{ ssl_certificate_path }};
{% endif %}

Testing: Molecule or Regret

If you’re not testing roles, you’re testing in production:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# molecule/default/molecule.yml
dependency:
  name: galaxy
driver:
  name: docker
platforms:
  - name: ubuntu-22
    image: ubuntu:22.04
  - name: debian-12
    image: debian:12
provisioner:
  name: ansible
verifier:
  name: ansible
1
2
3
4
5
6
7
8
9
# molecule/default/verify.yml
- name: Verify
  hosts: all
  tasks:
    - name: Check Docker is running
      ansible.builtin.systemd:
        name: docker
      register: docker_service
      failed_when: docker_service.status.ActiveState != "active"

Run tests before every merge:

1
molecule test

This catches “works on my machine” before it becomes “broke production.”

The Import/Include Decision

Use import_tasks for static includes (parsed at playbook load):

1
2
- import_tasks: install.yml
- import_tasks: configure.yml

Use include_tasks for dynamic includes (parsed at runtime):

1
- include_tasks: "{{ ansible_os_family | lower }}.yml"

Rule of thumb: if the filename is a variable, use include_tasks. Otherwise, use import_tasks for better error messages and --list-tasks support.

Directory Structure That Scales

ansibirpcalnolonelals/eeylinpss(bsdserbtrt/yoieecelooaootpctqerdgukeluiu.iuhgihgrs.oroicecornoryyinrfstsogsorm.tseg/itu/tuolyy/mospsplm-en._._elpnyvyvsatmama)tslrlrc.sshy//.mylml###FuAElpmlpercdgoeenpnvlceoyrygmpeeanntctcehoinnlgy

Separate inventories per environment. Playbooks are entry points, not logic containers. Roles hold all the intelligence.

The Maintenance Payoff

Six months from now, when you need to:

  • Add a new server: ansible-playbook site.yml -l new-server
  • Update SSH keys: ansible-playbook site.yml --tags users:ssh-keys
  • Debug Docker issues: Look in roles/docker/, nowhere else
  • Onboard a new team member: “Roles are single-purpose, variables are in three places, run molecule to test”

That’s the goal: infrastructure that explains itself.