Terraform state is where infrastructure-as-code meets reality. It’s also where most Terraform disasters originate. Here’s how to manage state without losing sleep.
The Problem
Terraform tracks what it’s created in a state file. This file maps your HCL resources to real infrastructure. Without it, Terraform can’t update or destroy anything — it doesn’t know what exists.
The default is a local file called terraform.tfstate. This works fine until:
- Someone else needs to run Terraform
- Your laptop dies
- Two people run
applysimultaneously - You accidentally commit secrets to Git
Rule 1: Remote State from Day One
Never use local state for anything beyond experiments:
| |
Why S3 + DynamoDB?
- S3 stores the state (versioned, encrypted)
- DynamoDB provides locking (prevents concurrent applies)
- Both are cheap and managed
Create the backend resources first (yes, this is a chicken-and-egg problem):
| |
Rule 2: One State Per Environment
Don’t share state between environments. A bad apply in dev shouldn’t affect prod.
Each environment has its own state file, its own lock, its own blast radius.
Rule 3: Use Workspaces Sparingly
Terraform workspaces let you reuse configuration with different state:
| |
The problem: Workspaces share everything except state. You can’t have different backend configs, different providers, or different module versions per workspace.
Workspaces work for simple multi-tenant scenarios. For environments with different requirements, use separate directories.
Rule 4: Never Edit State Manually
The state file is JSON. You can read it. You shouldn’t edit it.
When you need to modify state, use the CLI:
| |
Rule 5: Handle State Drift
Infrastructure changes outside Terraform. Someone clicks in the console, an auto-scaling event occurs, a security patch updates something. Now state doesn’t match reality.
Detect drift regularly:
| |
In CI/CD:
| |
When drift happens:
- If Terraform should own it:
terraform applyto fix - If it was intentional:
terraform refreshto update state - If it’s out of scope: consider
lifecycle { ignore_changes = [...] }
Rule 6: Protect Sensitive Data
State contains secrets. Database passwords, API keys, anything you pass to a resource ends up in state.
Mitigations:
| |
The secret is still in state, but now it’s a reference you can rotate without changing Terraform.
Rule 7: Lock State in CI/CD
Concurrent Terraform runs corrupt state. Always lock:
| |
The DynamoDB lock handles Terraform-level concurrency. The CI concurrency group handles pipeline-level concurrency.
Rule 8: Plan for Recovery
State files get corrupted. Backends have outages. Plan for it:
| |
Backup state before risky operations:
| |
Rule 9: Split Large State Files
Monolithic state files are slow and risky. A single bad apply affects everything.
Split by:
- Lifecycle: Networking rarely changes, app infra changes often
- Team ownership: Platform team owns VPC, app teams own their services
- Blast radius: Prod database separate from prod compute
Reference across state files with terraform_remote_state:
| |
The Minimum Checklist
Starting a new Terraform project? Do these first:
- ✅ Create S3 bucket with versioning
- ✅ Create DynamoDB lock table
- ✅ Configure remote backend
- ✅ Set up CI/CD with concurrency controls
- ✅ Add drift detection to scheduled pipeline
- ✅ Document recovery procedures
State management isn’t exciting, but it’s the foundation. Get it right early, and you’ll avoid the 3 AM “who deleted production” incidents.