Ansible is deceptively simple. Write some YAML, run it, things happen. Then your playbooks grow, your team grows, and suddenly everything is a mess. Here’s how to write Ansible that scales.
Project Structure# a ├ ├ │ │ │ │ │ │ │ │ │ ├ │ │ │ ├ │ │ │ └ n ─ ─ ─ ─ ─ s ─ ─ ─ ─ ─ i b a i ├ │ │ │ │ │ └ p ├ ├ └ r ├ ├ └ c └ l n n ─ ─ l ─ ─ ─ o ─ ─ ─ o ─ e s v ─ ─ a ─ ─ ─ l ─ ─ ─ l ─ / i e y e l b n p ├ └ s ├ └ b s w d s c n p e r l t r ─ ─ t ─ ─ o i e a / o g o c e e o o ─ ─ a ─ ─ o t b t m i s t q . r d g k e s a m n t i u c i u h g ├ ├ └ i h g s . e b o x g o i f e c o r ─ ─ ─ n o r y r a n r n r g s t s o ─ ─ ─ g s m v s e s e / i t u / t u l e e s / m o s p a w d s p r s q e n . _ l e a . _ s . l n y v l b t y v . y / t m a . s a m a y m s l r y e b l r m l . s m r a s l y / l v s / m e e l r s s . . y y m m l l Separate inventories per environment. Group vars by function. Roles for reusable logic.
Inventory Best Practices# Use YAML Inventory# 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# inventories/production/hosts.yml
all :
children :
webservers :
hosts :
web01.example.com :
web02.example.com :
vars :
nginx_worker_processes : auto
databases :
hosts :
db01.example.com :
postgresql_role : primary
db02.example.com :
postgresql_role : replica
vars :
postgresql_version : 15
monitoring :
hosts :
prometheus.example.com :
Dynamic Inventory for Cloud# 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# aws_ec2.yml
plugin : amazon.aws.aws_ec2
regions :
- us-east-1
filters :
tag:Environment : production
keyed_groups :
- key : tags.Role
prefix : role
- key : placement.availability_zone
prefix : az
hostnames :
- private-ip-address
compose :
ansible_host : private_ip_address
Now ansible-inventory -i aws_ec2.yml --list shows live AWS instances.
Role Structure# r ├ │ ├ │ ├ │ ├ │ ├ │ ├ │ ├ │ └ o ─ ─ ─ ─ ─ ─ ─ ─ l ─ ─ ─ ─ ─ ─ ─ ─ e s d └ v └ t └ h └ t └ f └ m └ m └ / e ─ a ─ a ─ a ─ e ─ i ─ e ─ o ─ n f ─ r ─ s ─ n ─ m ─ l ─ t ─ l ─ g a s k d p e a e i u m / m s m l m l n s s / m c d n l a a / a e a a g / s a u e x t i i i r i t i l i l f / s n n n s n e n - n e a / . . . / . s x p . / u y y y y / . a y l m m m m c r m t l l l l o a l / n m f s . . j c 2 o n f # # # # # # # # D R T H J S R T e o a a i t o e f l s n n a l s a e k d j t e t u l a i i l v e e 2 c d n t a n r e g r t s t f p v i r e i e a a y ( m l n r b r p e d i l p e l s e a e o s a n b s i t t c l n a e i e ( t r s e s h t s i , ( g l h r o e w p l r o p e a r c d e e ) c d e e d n e c n e c ) e )
Defaults vs Vars# 1
2
3
4
5
6
7
8
# defaults/main.yml - Users can override these
nginx_worker_processes : auto
nginx_worker_connections : 1024
nginx_keepalive_timeout : 65
# vars/main.yml - Internal to the role, not meant to be overridden
nginx_conf_path : /etc/nginx/nginx.conf
nginx_service_name : nginx
Task Patterns# Idempotency Always# 1
2
3
4
5
6
7
8
9
# Good: Idempotent
- name : Ensure nginx is installed
ansible.builtin.package :
name : nginx
state : present
# Bad: Not idempotent
- name : Install nginx
ansible.builtin.shell : apt-get install -y nginx
Shell commands are a last resort. Most have idempotent module equivalents.
Use Block for Error Handling# 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
- name : Deploy application
block :
- name : Download artifact
ansible.builtin.get_url :
url : "{{ artifact_url }}"
dest : /tmp/app.tar.gz
- name : Extract artifact
ansible.builtin.unarchive :
src : /tmp/app.tar.gz
dest : /opt/app
remote_src : yes
- name : Restart service
ansible.builtin.systemd :
name : myapp
state : restarted
rescue :
- name : Rollback to previous version
ansible.builtin.copy :
src : /opt/app.backup/
dest : /opt/app/
remote_src : yes
- name : Alert on failure
ansible.builtin.uri :
url : "{{ slack_webhook }}"
method : POST
body_format : json
body :
text : "Deployment failed on {{ inventory_hostname }}"
always :
- name : Clean up temp files
ansible.builtin.file :
path : /tmp/app.tar.gz
state : absent
Conditionals Done Right# 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# Use 'when' with facts
- name : Install EPEL (RedHat family)
ansible.builtin.yum :
name : epel-release
state : present
when : ansible_os_family == "RedHat"
- name : Install packages (Debian family)
ansible.builtin.apt :
name : "{{ packages }}"
state : present
update_cache : yes
when : ansible_os_family == "Debian"
# Better: Use package manager abstraction
- name : Install packages
ansible.builtin.package :
name : "{{ common_packages }}"
state : present
Loops Efficiently# 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# Good: Single task with loop
- name : Create users
ansible.builtin.user :
name : "{{ item.name }}"
groups : "{{ item.groups }}"
state : present
loop : "{{ users }}"
# Better for large lists: Use loop_control
- name : Create many users
ansible.builtin.user :
name : "{{ item.name }}"
groups : "{{ item.groups }}"
loop : "{{ users }}"
loop_control :
label : "{{ item.name }}" # Cleaner output
pause : 1 # Rate limiting if needed
Variable Precedence# From lowest to highest (simplified):
Role defaults (roles/x/defaults/main.yml) Inventory group_vars (inventories/prod/group_vars/all.yml) Inventory host_vars Playbook group_vars Playbook host_vars Role vars (roles/x/vars/main.yml) Task vars (set_fact, vars:) Extra vars (-e) Rule of thumb : Put user-configurable values in defaults, put internal values in vars.
Secrets Management# Ansible Vault# 1
2
3
4
5
6
7
8
9
10
# Create encrypted file
ansible-vault create group_vars/all/vault.yml
# Edit encrypted file
ansible-vault edit group_vars/all/vault.yml
# Run playbook with vault
ansible-playbook site.yml --ask-vault-pass
# Or with password file
ansible-playbook site.yml --vault-password-file ~/.vault_pass
1
2
3
4
5
6
7
# group_vars/all/vault.yml (encrypted)
vault_db_password : supersecret
vault_api_key : sk-12345
# group_vars/all/main.yml (references vault)
db_password : "{{ vault_db_password }}"
api_key : "{{ vault_api_key }}"
External Secrets (HashiCorp Vault)# 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
- name : Get secret from Vault
community.hashi_vault.vault_kv2_get :
url : "{{ vault_url }}"
path : secret/data/myapp
auth_method : token
token : "{{ vault_token }}"
register : secret_data
delegate_to : localhost
no_log : true
- name : Use secret
ansible.builtin.template :
src : config.j2
dest : /etc/myapp/config.yml
vars :
db_password : "{{ secret_data.secret.db_password }}"
Handler Patterns# 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# handlers/main.yml
- name : Restart nginx
ansible.builtin.systemd :
name : nginx
state : restarted
listen : "restart web server"
- name : Reload nginx
ansible.builtin.systemd :
name : nginx
state : reloaded
listen : "reload web server"
# tasks/main.yml
- name : Update nginx config
ansible.builtin.template :
src : nginx.conf.j2
dest : /etc/nginx/nginx.conf
notify : reload web server
- name : Update SSL certificates
ansible.builtin.copy :
src : "{{ item }}"
dest : /etc/nginx/ssl/
loop :
- cert.pem
- key.pem
notify : restart web server
Use listen for multiple handlers responding to one event.
Pipelining# 1
2
3
# ansible.cfg
[ssh_connection]
pipelining = True
Reduces SSH operations. Requires requiretty disabled in sudoers.
Async for Long Tasks# 1
2
3
4
5
6
7
8
9
10
11
12
13
- name : Run long database migration
ansible.builtin.shell : /opt/app/migrate.sh
async: 3600 # Max runtime : 1 hour
poll : 0 # Don't wait
register : migration_job
- name : Wait for migration to complete
ansible.builtin.async_status :
jid : "{{ migration_job.ansible_job_id }}"
register : job_result
until : job_result.finished
retries : 60
delay : 60
Fact Caching# 1
2
3
4
5
# ansible.cfg
[defaults]
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible_facts
fact_caching_timeout = 86400
Skip fact gathering on subsequent runs.
Free Strategy# 1
2
3
4
5
6
7
# playbook.yml
- hosts : webservers
strategy : free # Don't wait for all hosts per task
tasks :
- name : Update packages
ansible.builtin.apt :
upgrade : dist
Hosts proceed independently. Faster but harder to debug.
Testing with Molecule# 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# molecule/default/molecule.yml
dependency :
name : galaxy
driver :
name : docker
platforms :
- name : ubuntu
image : ubuntu:22.04
pre_build_image : true
- name : rocky
image : rockylinux:9
pre_build_image : true
provisioner :
name : ansible
verifier :
name : ansible
1
2
3
4
5
6
7
8
9
10
11
# molecule/default/verify.yml
- name : Verify
hosts : all
tasks :
- name : Check nginx is running
ansible.builtin.service_facts :
- name : Assert nginx is running
ansible.builtin.assert :
that :
- ansible_facts.services['nginx.service'].state == 'running'
The Checklist# Start Here# Today : Enable pipelining in ansible.cfgThis week : Convert a playbook to a roleThis month : Add Molecule tests to one roleThis quarter : Implement dynamic inventoryGood Ansible is boring Ansible. Predictable, testable, and maintainable.
The best automation is the one you can hand to a teammate and they understand it immediately.