Terraform state is both essential and dangerous. It’s how Terraform knows what exists, what changed, and what to do. Mismanage it, and you’ll either destroy production or spend hours untangling drift.

What State Actually Is

State is Terraform’s record of reality. It maps your configuration to real resources:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
{
  "resources": [
    {
      "type": "aws_instance",
      "name": "web",
      "instances": [
        {
          "attributes": {
            "id": "i-0abc123def456",
            "ami": "ami-12345678",
            "instance_type": "t3.medium"
          }
        }
      ]
    }
  ]
}

Without state, Terraform would:

  • Not know what resources it manages
  • Create duplicates on every apply
  • Have no way to track changes

Remote State: Stop Using Local Files

Local state (terraform.tfstate) is fine for learning. For anything real, use remote state:

S3 Backend (AWS)

1
2
3
4
5
6
7
8
9
terraform {
  backend "s3" {
    bucket         = "my-terraform-state"
    key            = "prod/infrastructure.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-locks"
  }
}

Create the backend resources first:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# backend-setup/main.tf (apply this separately)
resource "aws_s3_bucket" "state" {
  bucket = "my-terraform-state"
}

resource "aws_s3_bucket_versioning" "state" {
  bucket = aws_s3_bucket.state.id
  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_dynamodb_table" "locks" {
  name         = "terraform-locks"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"
  
  attribute {
    name = "LockID"
    type = "S"
  }
}

Other Backends

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Terraform Cloud
terraform {
  cloud {
    organization = "my-org"
    workspaces {
      name = "production"
    }
  }
}

# GCS (Google Cloud)
terraform {
  backend "gcs" {
    bucket = "my-terraform-state"
    prefix = "prod"
  }
}

# Azure
terraform {
  backend "azurerm" {
    resource_group_name  = "terraform-state"
    storage_account_name = "tfstate12345"
    container_name       = "state"
    key                  = "prod.tfstate"
  }
}

State Locking: Prevent Concurrent Modifications

Without locking, two people running terraform apply simultaneously can corrupt state.

S3 + DynamoDB provides locking automatically. For other backends, check their documentation.

1
2
3
4
# If lock is stuck (someone's apply crashed)
terraform force-unlock LOCK_ID

# Use with caution - make sure no one is actually running

State Organization

One State Per Environment

envirdspoetrnvaom/gdemim/mnanaatigiisnnn/...tttfff###bbbaaaccckkkeeennndddkkkeeeyyy:::dspetrvao/gdii/nnifgnr/fair.natf.frtsaft.satttfaestteate

Benefits:

  • Changes to dev can’t affect prod
  • Smaller blast radius
  • Independent applies

State Per Component

For large infrastructures, split by component:

ndkaeaupttbpwaelobriranckseaiettn/eigs/ns/####ndkaeaupttbpwaelobriranckseaiettn.eigtso.f.ntstsftf.sasttttfaeasttteeate

Use terraform_remote_state to reference outputs:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# In applications/main.tf
data "terraform_remote_state" "networking" {
  backend = "s3"
  config = {
    bucket = "my-terraform-state"
    key    = "networking.tfstate"
    region = "us-east-1"
  }
}

resource "aws_instance" "app" {
  subnet_id = data.terraform_remote_state.networking.outputs.private_subnet_id
}

State Operations

Viewing State

1
2
3
4
5
6
7
8
# List all resources
terraform state list

# Show specific resource
terraform state show aws_instance.web

# Pull remote state to local file
terraform state pull > state.json

Moving Resources

Rename without destroy/recreate:

1
2
3
4
5
# Moved resource in config from "old_name" to "new_name"
terraform state mv aws_instance.old_name aws_instance.new_name

# Move to a module
terraform state mv aws_instance.web module.compute.aws_instance.web

Removing from State

Remove resource from Terraform management (resource still exists):

1
2
3
4
5
6
7
# Stop managing this resource
terraform state rm aws_instance.legacy

# Useful when:
# - Importing to different state
# - Handing off to another team
# - Resource should no longer be managed by Terraform

Importing Existing Resources

Bring existing infrastructure under Terraform management:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# Add resource block first
resource "aws_instance" "imported" {
  # Will be filled in after import
}

# Import the resource
terraform import aws_instance.imported i-0abc123def456

# Run plan to see what attributes to add
terraform plan

# Fill in the resource block to match reality

Terraform 1.5+ has import blocks:

1
2
3
4
import {
  to = aws_instance.imported
  id = "i-0abc123def456"
}

Handling State Drift

Drift = reality doesn’t match state (someone made manual changes).

Detecting Drift

1
2
3
4
5
# Refresh state from reality
terraform refresh

# Plan will show drift
terraform plan

Fixing Drift

Option 1: Let Terraform fix it

1
terraform apply  # Brings reality back to match config

Option 2: Accept the manual change

1
2
3
# Update config to match reality
# Run plan to verify no changes
terraform plan

Option 3: Selective refresh

1
terraform apply -refresh-only

State Recovery

Recovering from Corruption

If state is corrupted, restore from backup:

1
2
3
4
5
6
7
8
# S3 versioning - list versions
aws s3api list-object-versions --bucket my-terraform-state --prefix prod/

# Restore specific version
aws s3api get-object --bucket my-terraform-state --key prod/infra.tfstate --version-id VERSION_ID recovered.tfstate

# Push recovered state
terraform state push recovered.tfstate

Complete State Loss

If you lose state entirely:

  1. Don’t panic - resources still exist
  2. Re-import everything:
    1
    2
    3
    
    terraform import aws_vpc.main vpc-12345
    terraform import aws_subnet.public subnet-12345
    # ... for every resource
    
  3. Consider tools like terraformer for bulk import

Best Practices

Always Enable Versioning

1
2
3
4
5
6
resource "aws_s3_bucket_versioning" "state" {
  bucket = aws_s3_bucket.state.id
  versioning_configuration {
    status = "Enabled"
  }
}

You will need to roll back eventually.

Encrypt State

State contains sensitive data (passwords, keys, IPs):

1
2
3
4
5
backend "s3" {
  encrypt = true
  # Or use KMS
  kms_key_id = "arn:aws:kms:..."
}

Don’t Edit State Manually

Use terraform state commands, not text editors. Manual edits break checksums and can corrupt state.

Use Workspaces Carefully

1
2
terraform workspace new staging
terraform workspace select production

Workspaces share the same backend config. They’re useful for lightweight environment separation but can get confusing. Many teams prefer separate directories.

Plan Before Apply, Always

1
2
3
4
5
# Save plan
terraform plan -out=tfplan

# Apply saved plan
terraform apply tfplan

The plan file ensures you apply exactly what you reviewed.

Common Mistakes

  1. Committing state to git: State contains secrets. Use remote backends.

  2. No locking: Concurrent applies corrupt state. Always use locking.

  3. Shared state for unrelated resources: Split state by environment and component.

  4. Manual changes without updating state: Use -refresh-only or re-apply.

  5. Force-unlocking carelessly: Make sure no one is actually running first.

  6. No versioning on state bucket: You’ll regret this during recovery.


State management isn’t glamorous, but it’s where Terraform goes wrong. Set up remote state with locking from day one, enable versioning, and treat state operations with care. Your future self will thank you.