Terraform state is where reality meets code. Get it wrong, and you’ll destroy production infrastructure or spend hours untangling drift. Here’s how to manage state like a pro.

What Is State?

Terraform state (terraform.tfstate) maps your configuration to real-world resources:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
{
  "resources": [
    {
      "type": "aws_instance",
      "name": "web",
      "instances": [
        {
          "attributes": {
            "id": "i-0abc123def456",
            "ami": "ami-12345678",
            "instance_type": "t3.medium"
          }
        }
      ]
    }
  ]
}

Without state, Terraform doesn’t know what exists. It would try to create everything fresh every time.

Local State Problems

1
2
3
4
# Default: state lives locally
terraform init
terraform apply
# Creates terraform.tfstate in current directory

Problems:

  • Can’t collaborate (who has the latest state?)
  • No locking (two people run apply simultaneously = disaster)
  • State contains secrets (database passwords, etc.)
  • Easy to lose (laptop dies, state gone)

Remote State: The Solution

S3 + DynamoDB (AWS Standard)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# backend.tf
terraform {
  backend "s3" {
    bucket         = "mycompany-terraform-state"
    key            = "prod/network/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-locks"
  }
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# Create the backend resources first (bootstrap)
resource "aws_s3_bucket" "terraform_state" {
  bucket = "mycompany-terraform-state"
}

resource "aws_s3_bucket_versioning" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id
  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id
  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "aws:kms"
    }
  }
}

resource "aws_dynamodb_table" "terraform_locks" {
  name         = "terraform-locks"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"
  
  attribute {
    name = "LockID"
    type = "S"
  }
}

GCS (Google Cloud)

1
2
3
4
5
6
terraform {
  backend "gcs" {
    bucket = "mycompany-terraform-state"
    prefix = "prod/network"
  }
}

Azure Blob Storage

1
2
3
4
5
6
7
8
terraform {
  backend "azurerm" {
    resource_group_name  = "terraform-state-rg"
    storage_account_name = "tfstate12345"
    container_name       = "tfstate"
    key                  = "prod.terraform.tfstate"
  }
}

Terraform Cloud

1
2
3
4
5
6
7
8
terraform {
  cloud {
    organization = "mycompany"
    workspaces {
      name = "prod-network"
    }
  }
}

State Locking

Prevents concurrent modifications:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# When someone runs apply, lock is acquired
$ terraform apply
Acquiring state lock. This may take a few moments...

# Another person tries simultaneously
$ terraform apply
Error: Error locking state: Error acquiring the state lock
Lock Info:
  ID:        abc-123
  Path:      s3://bucket/terraform.tfstate
  Operation: OperationTypeApply
  Who:       alice@laptop
  Created:   2024-03-12 17:30:00

Force unlock (use carefully):

1
terraform force-unlock abc-123

State Organization

By Environment

envirpsdortenoavmdg/e/mbimbmbnaanaaaaticgicicsnk/nknk/.e.e.etntntnfdfdfd...tttfff###kkkeeeyyy==="""psdrteoavdg//ittneegrr/rrtaaeffroorrrammf..ottrffmss.tttaafttseet""ate"

By Component

ncdeoatmtwpaobubbbrataaakcecsc/k/kekee/ennnddd...tttfff###kkkeeeyyy==="""ncdeoatmtwpaoubrtakes//ett/eetrrerrraarffaoofrrommr..mtt.fftssfttsaatttaeet""e"

Workspaces

1
2
3
4
5
6
7
8
9
# Create workspace
terraform workspace new staging
terraform workspace new prod

# Switch workspace
terraform workspace select prod

# List workspaces
terraform workspace list
1
2
3
4
5
6
7
8
# Use workspace in config
resource "aws_instance" "web" {
  instance_type = terraform.workspace == "prod" ? "t3.large" : "t3.micro"
  
  tags = {
    Environment = terraform.workspace
  }
}

Reading Remote State

Share data between configurations:

1
2
3
4
5
6
7
8
# network/outputs.tf
output "vpc_id" {
  value = aws_vpc.main.id
}

output "private_subnet_ids" {
  value = aws_subnet.private[*].id
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# compute/main.tf
data "terraform_remote_state" "network" {
  backend = "s3"
  config = {
    bucket = "mycompany-terraform-state"
    key    = "network/terraform.tfstate"
    region = "us-east-1"
  }
}

resource "aws_instance" "web" {
  subnet_id = data.terraform_remote_state.network.outputs.private_subnet_ids[0]
  vpc_security_group_ids = [
    data.terraform_remote_state.network.outputs.web_sg_id
  ]
}

State Commands

Inspect State

1
2
3
4
5
6
7
8
# List all resources
terraform state list

# Show specific resource
terraform state show aws_instance.web

# Pull remote state locally
terraform state pull > state.json

Move Resources

1
2
3
4
5
6
7
8
# Rename a resource
terraform state mv aws_instance.web aws_instance.application

# Move to module
terraform state mv aws_instance.web module.compute.aws_instance.web

# Move between states
terraform state mv -state-out=other.tfstate aws_instance.web aws_instance.web

Remove from State

1
2
3
4
# Remove from state (doesn't destroy the resource)
terraform state rm aws_instance.legacy

# Resource still exists in AWS, just not managed by Terraform

Import Existing Resources

1
2
3
4
5
# Import existing resource into state
terraform import aws_instance.web i-0abc123def456

# Import into module
terraform import module.compute.aws_instance.web i-0abc123def456
1
2
3
4
# Must have matching config
resource "aws_instance" "web" {
  # Terraform will populate from imported state
}

State Manipulation (Dangerous)

Replace Provider

1
2
# Change provider namespace
terraform state replace-provider hashicorp/aws registry.example.com/aws

Edit State Directly

1
2
3
4
# Pull, edit, push (dangerous!)
terraform state pull > state.json
# Edit state.json carefully
terraform state push state.json

When to edit state directly:

  • Never, if you can avoid it
  • Fixing corrupted state
  • Mass resource renames
  • Provider migrations

Handling Drift

When reality differs from state:

1
2
3
4
5
6
7
8
9
# Detect drift
terraform plan
# Shows: "Resource has been changed outside of Terraform"

# Option 1: Accept the drift (update state to match reality)
terraform apply -refresh-only

# Option 2: Revert to config (change reality to match config)
terraform apply

Prevent Drift

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# Lifecycle rules
resource "aws_instance" "web" {
  lifecycle {
    ignore_changes = [
      tags["LastModified"],  # Ignore specific attribute changes
    ]
  }
}

# Prevent destruction
resource "aws_db_instance" "main" {
  lifecycle {
    prevent_destroy = true
  }
}

Disaster Recovery

State Versioning

S3 versioning lets you recover previous state:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# List versions
aws s3api list-object-versions \
  --bucket mycompany-terraform-state \
  --prefix prod/terraform.tfstate

# Restore previous version
aws s3api get-object \
  --bucket mycompany-terraform-state \
  --key prod/terraform.tfstate \
  --version-id abc123 \
  recovered.tfstate

terraform state push recovered.tfstate

Backup Strategy

1
2
3
4
5
6
7
8
#!/bin/bash
# backup-state.sh
DATE=$(date +%Y%m%d-%H%M%S)

terraform state pull > "backups/state-${DATE}.json"

# Keep last 30 backups
ls -t backups/state-*.json | tail -n +31 | xargs rm -f

Security

State Contains Secrets

1
2
3
4
5
6
{
  "type": "aws_db_instance",
  "attributes": {
    "password": "supersecret123"  // In plain text!
  }
}

Mitigations:

  • Encrypt state at rest (S3 SSE, KMS)
  • Restrict state bucket access (IAM policies)
  • Use secrets managers instead of Terraform for passwords
  • Enable access logging

Least Privilege

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# IAM policy for state access
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject"
      ],
      "Resource": "arn:aws:s3:::terraform-state/prod/*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "dynamodb:GetItem",
        "dynamodb:PutItem",
        "dynamodb:DeleteItem"
      ],
      "Resource": "arn:aws:dynamodb:*:*:table/terraform-locks"
    }
  ]
}

The Checklist

  • Remote state configured
  • State locking enabled
  • Encryption at rest
  • Versioning enabled
  • Access logging enabled
  • Least privilege IAM
  • State organized by environment/component
  • Backup strategy defined

Common Mistakes

  1. Committing state to git — Never do this
  2. Running apply without locking — Use remote state
  3. Editing state manually — Use state commands
  4. One big state file — Split by concern
  5. No versioning — Enable it on day one

Start Here

  1. Today: Enable remote state with locking
  2. This week: Enable versioning and encryption
  3. This month: Split state by environment
  4. This quarter: Implement proper IAM and auditing

State is the source of truth. Treat it like the crown jewels.


Your Terraform is only as good as your state management. Everything else is just text files.