Infrastructure as Code (IaC) means your servers, networks, and services are defined in version-controlled files rather than clicked into existence through consoles. The benefits are obvious: reproducibility, auditability, collaboration.

But IaC done poorly creates its own problems: state drift, copy-paste sprawl, untestable configurations. The principles matter more than the tools.

Declarative Over Imperative

Describe what you want, not how to get there:

1
2
3
4
5
6
7
8
9
# Declarative (Terraform) - what
resource "aws_instance" "web" {
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t3.micro"
  
  tags = {
    Name = "web-server"
  }
}
1
2
3
4
5
# Imperative (script) - how
aws ec2 run-instances \
  --image-id ami-0c55b159cbfafe1f0 \
  --instance-type t3.micro \
  --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=web-server}]'

Declarative code is idempotent — run it ten times, get the same result. Imperative scripts need guards against re-running.

Idempotency: Safe to Re-Run

Every apply should be safe to run repeatedly:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Idempotent - creates if missing, updates if different, no-op if same
resource "aws_s3_bucket" "data" {
  bucket = "my-data-bucket"
}

# Not idempotent - would fail on second run
resource "null_resource" "setup" {
  provisioner "local-exec" {
    command = "aws s3 mb s3://my-data-bucket"  # Fails if exists
  }
}

If you must use provisioners or scripts, add guards:

1
2
3
provisioner "local-exec" {
  command = "aws s3 ls s3://my-bucket || aws s3 mb s3://my-bucket"
}

Immutable Infrastructure

Don’t modify running instances. Replace them:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# Mutable - SSH in and update
resource "aws_instance" "web" {
  # ... 
  provisioner "remote-exec" {
    inline = ["apt-get update && apt-get upgrade -y"]  # Drift waiting to happen
  }
}

# Immutable - new AMI, new instance
resource "aws_instance" "web" {
  ami = var.latest_ami  # Change AMI, replace instance
}

Immutable infrastructure means:

  • No configuration drift
  • Easy rollback (deploy previous AMI)
  • Reproducible environments

State Management

Terraform state is your source of truth for what exists:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Remote state - team collaboration
terraform {
  backend "s3" {
    bucket         = "my-terraform-state"
    key            = "prod/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-locks"
  }
}

State hygiene:

  • Never edit state manually (use terraform state commands)
  • Always use remote state for teams
  • Enable state locking to prevent concurrent modifications
  • Encrypt state at rest (contains secrets)

Modularity: DRY Infrastructure

Extract repeated patterns into modules:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# modules/vpc/main.tf
variable "cidr_block" {}
variable "environment" {}

resource "aws_vpc" "main" {
  cidr_block = var.cidr_block
  tags = {
    Environment = var.environment
  }
}

output "vpc_id" {
  value = aws_vpc.main.id
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# environments/prod/main.tf
module "vpc" {
  source      = "../../modules/vpc"
  cidr_block  = "10.0.0.0/16"
  environment = "production"
}

module "vpc_staging" {
  source      = "../../modules/vpc"
  cidr_block  = "10.1.0.0/16"
  environment = "staging"
}

Modules enforce consistency and reduce duplication.

Environment Parity

Same code, different variables:

inframesontdvruiulverrdspcepkdoetrtscssnvaou////m/gdremtimt/mtenaenaeae/tirgirirsnr/nrnr/.a.a.atftftffofoforrrmmm...tttfffvvvaaarrrsss
1
2
3
4
5
6
7
8
9
# environments/dev/terraform.tfvars
instance_type = "t3.micro"
min_instances = 1
max_instances = 2

# environments/prod/terraform.tfvars
instance_type = "t3.large"
min_instances = 3
max_instances = 10

Dev and prod use identical module code with different parameters.

Version Pinning

Pin everything — providers, modules, base images:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
terraform {
  required_version = "~> 1.5.0"
  
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "5.1.0"  # Explicit version
}

Unpinned versions mean different results on different days. That’s not reproducible infrastructure.

Secrets Management

Never commit secrets to IaC repos:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Bad - secret in code
resource "aws_db_instance" "main" {
  password = "supersecretpassword123"  # In git history forever
}

# Good - from secret manager
data "aws_secretsmanager_secret_version" "db_password" {
  secret_id = "prod/database/password"
}

resource "aws_db_instance" "main" {
  password = data.aws_secretsmanager_secret_version.db_password.secret_string
}

# Good - from variable (injected at runtime)
variable "db_password" {
  sensitive = true
}

resource "aws_db_instance" "main" {
  password = var.db_password
}

Use sensitive = true to prevent values from appearing in logs.

Testing Infrastructure

Yes, you can test IaC:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Terratest example
def test_vpc_created():
    terraform.init()
    terraform.apply()
    
    vpc_id = terraform.output("vpc_id")
    assert vpc_id.startswith("vpc-")
    
    # Verify VPC exists in AWS
    vpc = ec2.describe_vpcs(VpcIds=[vpc_id])
    assert len(vpc['Vpcs']) == 1
    
    terraform.destroy()

At minimum:

  • terraform validate — syntax check
  • terraform plan — preview changes
  • tflint — linting
  • Actual deployment to ephemeral environment — integration test

Plan Before Apply

Always review plans:

1
2
3
4
5
6
7
8
# Generate plan
terraform plan -out=tfplan

# Review it (automated or manual)
terraform show tfplan

# Apply only the reviewed plan
terraform apply tfplan

In CI/CD:

1
2
3
4
5
6
7
8
- name: Plan
  run: terraform plan -out=tfplan
  
- name: Manual Approval
  uses: trstringer/manual-approval@v1
  
- name: Apply
  run: terraform apply tfplan

Never terraform apply without reviewing what will change.

Drift Detection

Infrastructure changes outside of IaC (console clicks, scripts) create drift:

1
2
3
4
5
6
7
# Detect drift
terraform plan

# If plan shows changes you didn't make:
# 1. Import the manual change into state
# 2. Or revert the manual change
# 3. Never ignore it

Regular drift detection (daily or per-deploy) catches unauthorized changes before they cause problems.

Documentation as Code

Document inline:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# This VPC hosts all production workloads.
# CIDR chosen to not overlap with office network (192.168.0.0/16)
# for VPN connectivity.
resource "aws_vpc" "production" {
  cidr_block = "10.0.0.0/16"
  
  tags = {
    Name        = "production"
    CostCenter  = "engineering"
    ManagedBy   = "terraform"
  }
}

Tags and comments explain why, not just what.


Infrastructure as Code is a practice, not a tool. Terraform, Pulumi, CloudFormation — they all work. What matters is the discipline: declarative definitions, immutable deployments, versioned state, tested changes, no secrets in repos.

Write infrastructure like you write application code: reviewed, tested, versioned, and understood by the team. The server you can recreate in minutes is worth more than the server you’ve been nursing for years.