Terraform state is both the source of its power and the cause of most Terraform disasters. Get it wrong and you’re recreating production resources at 2 AM. Get it right and infrastructure changes become boring (the good kind).
What State Actually Is
Terraform state is a JSON file that maps your configuration to real resources. When you write aws_instance.web, Terraform needs to know which actual EC2 instance that refers to. State is that mapping.
| |
Without state, Terraform would create a new instance every apply. With state, it knows to update or leave alone the existing one.
Remote State: Not Optional
Local state files work for learning. For anything else, use remote state.
| |
The DynamoDB table provides locking — critical when multiple people run Terraform. Without it, two simultaneous applies corrupt your state.
State Isolation Patterns
Per-Environment
Separate state per environment means a bad terraform destroy in dev can’t touch prod.
Per-Component
Blast radius reduction. Database changes don’t risk compute resources.
Workspaces: Use With Caution
| |
Workspaces share code but separate state. Sounds elegant, but:
- All environments use identical code (no per-env tweaks)
- Easy to forget which workspace you’re in
terraform destroyin wrong workspace is catastrophic
I’ve seen teams succeed with workspaces. I’ve seen more teams abandon them after incidents.
State Operations You’ll Need
Import Existing Resources
| |
Then write the matching configuration. Use terraform plan to iterate until no changes detected.
Move Resources Between Modules
| |
Essential during refactoring. Without this, Terraform destroys and recreates.
Remove Without Destroying
| |
Removes from state but leaves the actual resource. Useful when handing off resources to another Terraform config or manual management.
Replace Tainted Resources
| |
Forces recreation on next apply.
Recovering from State Disasters
State Got Corrupted
| |
Enable S3 versioning on your state bucket. It’s saved me twice.
Lost State Entirely
Import everything. It’s painful but possible:
| |
For complex infrastructure, consider terraformer to generate config from existing resources.
Sensitive Data in State
State contains everything — including secrets:
| |
Protect it:
- S3 encryption:
encrypt = truein backend config - Bucket policies: Restrict who can read
- No git: Never commit state files (add
*.tfstateto.gitignore) - Audit logs: Enable S3 access logging
For databases, use random_password and store in Secrets Manager instead of state:
| |
Password still in state, but retrieval goes through Secrets Manager.
State Locking Deep Dive
DynamoDB locking for S3 backend:
| |
If a lock gets stuck (Terraform crashed mid-apply):
| |
Use sparingly. Only when you’re certain no other process is running.
Best Practices Summary
- Remote state from day one — local state doesn’t scale
- Enable versioning — your safety net for corruption
- Isolate state — per environment at minimum
- Lock state — DynamoDB for S3, built-in for Terraform Cloud
- Encrypt state — it contains secrets
- Don’t edit state manually — use
terraform statecommands - Backup before risky operations —
terraform state pull > backup.tfstate
State management isn’t exciting, but getting it right means infrastructure changes stay boring. And in operations, boring is beautiful.