Terraform state is both the source of its power and the cause of most Terraform disasters. Get it wrong and you’re recreating production resources at 2 AM. Get it right and infrastructure changes become boring (the good kind).

What State Actually Is

Terraform state is a JSON file that maps your configuration to real resources. When you write aws_instance.web, Terraform needs to know which actual EC2 instance that refers to. State is that mapping.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
{
  "resources": [{
    "type": "aws_instance",
    "name": "web",
    "instances": [{
      "attributes": {
        "id": "i-0abc123def456789",
        "ami": "ami-0123456789abcdef0"
      }
    }]
  }]
}

Without state, Terraform would create a new instance every apply. With state, it knows to update or leave alone the existing one.

Remote State: Not Optional

Local state files work for learning. For anything else, use remote state.

1
2
3
4
5
6
7
8
9
terraform {
  backend "s3" {
    bucket         = "my-terraform-state"
    key            = "prod/infrastructure.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-locks"
  }
}

The DynamoDB table provides locking — critical when multiple people run Terraform. Without it, two simultaneous applies corrupt your state.

State Isolation Patterns

Per-Environment

envirdspoetrnvaom/gdemim/mnanaatigiisnnn/...tttfff###bbbaaaccckkkeeennndddkkkeeeyyy:::"""dspetrvao/gdti/entrger/ratrfeaorfrromar.fmto.frtsmft.satttfaest"tea"te"

Separate state per environment means a bad terraform destroy in dev can’t touch prod.

Per-Component

ndceaottmwapombmumraaatakisieiinennn.s..gttt/fff###VREPDCCS2s,,,EAslSuaGbs,nteiEtCCsaS,chreoutes

Blast radius reduction. Database changes don’t risk compute resources.

Workspaces: Use With Caution

1
2
terraform workspace new staging
terraform workspace select prod

Workspaces share code but separate state. Sounds elegant, but:

  • All environments use identical code (no per-env tweaks)
  • Easy to forget which workspace you’re in
  • terraform destroy in wrong workspace is catastrophic

I’ve seen teams succeed with workspaces. I’ve seen more teams abandon them after incidents.

State Operations You’ll Need

Import Existing Resources

1
terraform import aws_instance.web i-0abc123def456789

Then write the matching configuration. Use terraform plan to iterate until no changes detected.

Move Resources Between Modules

1
terraform state mv 'aws_instance.web' 'module.compute.aws_instance.web'

Essential during refactoring. Without this, Terraform destroys and recreates.

Remove Without Destroying

1
terraform state rm aws_instance.web

Removes from state but leaves the actual resource. Useful when handing off resources to another Terraform config or manual management.

Replace Tainted Resources

1
2
3
terraform taint aws_instance.web
# or in newer versions:
terraform apply -replace="aws_instance.web"

Forces recreation on next apply.

Recovering from State Disasters

State Got Corrupted

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# Pull last good state from S3 versioning
aws s3api list-object-versions \
  --bucket my-terraform-state \
  --prefix prod/infrastructure.tfstate

aws s3api get-object \
  --bucket my-terraform-state \
  --key prod/infrastructure.tfstate \
  --version-id "ABC123" \
  restored.tfstate

terraform state push restored.tfstate

Enable S3 versioning on your state bucket. It’s saved me twice.

Lost State Entirely

Import everything. It’s painful but possible:

1
2
3
4
5
6
# List what exists
aws ec2 describe-instances --query 'Reservations[].Instances[].InstanceId'

# Import each one
terraform import aws_instance.web i-0abc123
terraform import aws_instance.api i-0abc456

For complex infrastructure, consider terraformer to generate config from existing resources.

Sensitive Data in State

State contains everything — including secrets:

1
2
3
4
5
{
  "attributes": {
    "password": "hunter2"
  }
}

Protect it:

  • S3 encryption: encrypt = true in backend config
  • Bucket policies: Restrict who can read
  • No git: Never commit state files (add *.tfstate to .gitignore)
  • Audit logs: Enable S3 access logging

For databases, use random_password and store in Secrets Manager instead of state:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
resource "random_password" "db" {
  length  = 32
  special = true
}

resource "aws_secretsmanager_secret_version" "db" {
  secret_id     = aws_secretsmanager_secret.db.id
  secret_string = random_password.db.result
}

resource "aws_db_instance" "main" {
  password = random_password.db.result
}

Password still in state, but retrieval goes through Secrets Manager.

State Locking Deep Dive

DynamoDB locking for S3 backend:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
resource "aws_dynamodb_table" "terraform_locks" {
  name         = "terraform-locks"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }
}

If a lock gets stuck (Terraform crashed mid-apply):

1
terraform force-unlock LOCK_ID

Use sparingly. Only when you’re certain no other process is running.

Best Practices Summary

  1. Remote state from day one — local state doesn’t scale
  2. Enable versioning — your safety net for corruption
  3. Isolate state — per environment at minimum
  4. Lock state — DynamoDB for S3, built-in for Terraform Cloud
  5. Encrypt state — it contains secrets
  6. Don’t edit state manually — use terraform state commands
  7. Backup before risky operationsterraform state pull > backup.tfstate

State management isn’t exciting, but getting it right means infrastructure changes stay boring. And in operations, boring is beautiful.