Your Ansible playbook started simple. One file, fifty lines, deploys your app. Beautiful.
Six months later, it’s 2,000 lines of YAML spaghetti with thirty when conditionals, variables defined in five different places, and a tasks/main.yml that makes you wince every time you open it.
Here’s how to avoid that trajectory.
The Single Responsibility Role# Every role should do one thing. Not “configure the server” — that’s five things. One thing:
r ├ ├ ├ ├ ├ └ o ─ ─ ─ ─ ─ ─ l ─ ─ ─ ─ ─ ─ e s b u s d n a / a s s o o p s e h c d p e r - k e - - s h e - d p / a r e e a r / x p c d p l k e o o a n r y g i t / e n e s g r / / / # # # # # # I M S I P D n a S n r e s n H s o p t a t m l a g c a e o l e o l t y l n l h u f e y c s i a u o o e g n s u m r u d r m r n o a a c o a n c t o d p c i n e p p o o f l a u n i e i c n g x c k t u p a a s r o t g e r i e t o s D e n o r c k e r
Each role is independently testable. Each role has clear inputs and outputs. When something breaks, you know exactly where to look.
The anti-pattern:
r └ o ─ l ─ e s s └ / e ─ r ─ v e t r a - s s k e s t / u m p a / i n . y m l # # D 8 o 0 e 0 s l E i V n E e R s Y , T H n I o N b G o d y u n d e r s t a n d s i t
Variables: The Three-Layer Rule# Variables should live in exactly three places:
1. Role Defaults (defaults/main.yml)# Safe defaults that work for most cases:
1
2
3
4
5
# roles/docker/defaults/main.yml
docker_version : "24.0"
docker_storage_driver : "overlay2"
docker_log_max_size : "100m"
docker_log_max_file : 3
2. Group Variables (group_vars/)# Environment-specific overrides:
1
2
3
4
5
6
# group_vars/production.yml
docker_log_max_size : "500m"
docker_log_max_file : 10
# group_vars/development.yml
docker_version : "latest"
3. Host Variables (host_vars/)# Host-specific exceptions (use sparingly):
1
2
# host_vars/legacy-server-01.yml
docker_storage_driver : "devicemapper" # Old kernel, can't use overlay2
Never define variables in:
vars/main.yml (hard to override)Playbook files (scattered and forgotten) Task files (just no) Handlers: Idempotent and Specific# Handlers should be narrowly scoped and idempotent:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# roles/nginx/handlers/main.yml
- name : Reload nginx
ansible.builtin.systemd :
name : nginx
state : reloaded
listen : "reload nginx"
- name : Restart nginx
ansible.builtin.systemd :
name : nginx
state : restarted
listen : "restart nginx"
- name : Validate nginx config
ansible.builtin.command : nginx -t
changed_when : false
listen : "validate nginx"
Use listen for handler names — it allows multiple tasks to trigger the same handler with the same semantic name:
1
2
3
4
5
6
7
- name : Update nginx.conf
ansible.builtin.template :
src : nginx.conf.j2
dest : /etc/nginx/nginx.conf
notify :
- validate nginx
- reload nginx
Every logical group of tasks should have tags:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# roles/users/tasks/main.yml
- name : Create user accounts
ansible.builtin.user :
name : "{{ item.name }}"
groups : "{{ item.groups }}"
loop : "{{ users }}"
tags :
- users
- users:create
- name : Deploy SSH keys
ansible.posix.authorized_key :
user : "{{ item.name }}"
key : "{{ item.ssh_key }}"
loop : "{{ users }}"
tags :
- users
- users:ssh-keys
Now you can run:
1
2
3
4
5
# Full user setup
ansible-playbook site.yml --tags users
# Just update SSH keys (fast, targeted)
ansible-playbook site.yml --tags users:ssh-keys
The naming convention role:subtask keeps tags organized as your playbook grows.
Templates: Logic Belongs Elsewhere# Jinja2 templates should be mostly static. Complex logic in templates is a code smell:
1
2
3
4
5
6
7
8
{# BAD: Logic explosion in template #}
{% if environment == 'production' and region == 'us-east-1' and enable_ssl %}
{% if ssl_cert_type == 'letsencrypt' %}
ssl_certificate /etc/letsencrypt/live/{{ domain }}/fullchain.pem;
{% elif ssl_cert_type == 'custom' %}
ssl_certificate {{ custom_cert_path }};
{% endif %}
{% endif %}
Instead, compute values in tasks or defaults:
1
2
3
4
5
# defaults/main.yml
ssl_certificate_path : "/etc/ssl/certs/{{ domain }}.crt"
# Override in group_vars/production.yml if needed
ssl_certificate_path : "/etc/letsencrypt/live/{{ domain }}/fullchain.pem"
1
2
3
4
{# GOOD: Template is simple #}
{% if ssl_enabled %}
ssl_certificate {{ ssl_certificate_path }};
{% endif %}
Testing: Molecule or Regret# If you’re not testing roles, you’re testing in production:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# molecule/default/molecule.yml
dependency :
name : galaxy
driver :
name : docker
platforms :
- name : ubuntu-22
image : ubuntu:22.04
- name : debian-12
image : debian:12
provisioner :
name : ansible
verifier :
name : ansible
1
2
3
4
5
6
7
8
9
# molecule/default/verify.yml
- name : Verify
hosts : all
tasks :
- name : Check Docker is running
ansible.builtin.systemd :
name : docker
register : docker_service
failed_when : docker_service.status.ActiveState != "active"
Run tests before every merge:
This catches “works on my machine” before it becomes “broke production.”
The Import/Include Decision# Use import_tasks for static includes (parsed at playbook load):
1
2
- import_tasks : install.yml
- import_tasks : configure.yml
Use include_tasks for dynamic includes (parsed at runtime):
1
- include_tasks : "{{ ansible_os_family | lower }}.yml"
Rule of thumb: if the filename is a variable, use include_tasks. Otherwise, use import_tasks for better error messages and --list-tasks support.
Directory Structure That Scales# a ├ │ │ │ │ │ │ ├ │ ├ │ │ │ ├ │ └ n ─ ─ ─ ─ ─ s ─ ─ ─ ─ ─ i b i ├ │ │ └ r └ p ├ ├ └ c └ a l n ─ ─ o ─ l ─ ─ ─ o ─ n e ─ ─ l ─ a ─ ─ ─ l ─ s / e e y l i n p ├ └ s ├ └ s ( b s d s e r b t r ─ ─ t ─ ─ / y o i e e c e l o o ─ ─ a ─ ─ o o t p c t q e r d g u k e l u i u . i u h g i h g r s . o r o i c e c o r n o r y y i n r f s t s o g s o r m . t s e g / i t u / t u o l y y / m o s p s p l m - e n . _ . _ e l p n y v y v s a t m a m a ) t s l r l r c . s s h y / / . m y l m l # # # F u A E l p m l p e r c d g o e e n p n v l c e o y r y g m p e e a n n t c t c e h o i n n l g y Separate inventories per environment. Playbooks are entry points, not logic containers. Roles hold all the intelligence.
The Maintenance Payoff# Six months from now, when you need to:
Add a new server: ansible-playbook site.yml -l new-server Update SSH keys: ansible-playbook site.yml --tags users:ssh-keys Debug Docker issues: Look in roles/docker/, nowhere else Onboard a new team member: “Roles are single-purpose, variables are in three places, run molecule to test” That’s the goal: infrastructure that explains itself.