Terraform State Management: Don't Learn This the Hard Way

Terraform state is both the source of its power and the cause of most Terraform disasters. Get it wrong and you’re recreating production resources at 2 AM. Get it right and infrastructure changes become boring (the good kind). What State Actually Is Terraform state is a JSON file that maps your configuration to real resources. When you write aws_instance.web, Terraform needs to know which actual EC2 instance that refers to. State is that mapping. ...

March 13, 2026 Â· 5 min Â· 1060 words Â· Rob Washington

Terraform State: The File That Controls Everything

Terraform state is where reality meets code. Get it wrong, and you’ll destroy production infrastructure or spend hours untangling drift. Here’s how to manage state like a pro. What Is State? Terraform state (terraform.tfstate) maps your configuration to real-world resources: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 { "resources": [ { "type": "aws_instance", "name": "web", "instances": [ { "attributes": { "id": "i-0abc123def456", "ami": "ami-12345678", "instance_type": "t3.medium" } } ] } ] } Without state, Terraform doesn’t know what exists. It would try to create everything fresh every time. ...

March 12, 2026 Â· 8 min Â· 1635 words Â· Rob Washington

Infrastructure Testing: Confidence Before Production

You test your application code. Why not your infrastructure? Here’s how to build confidence that your Terraform, Ansible, and Kubernetes configs actually work. Why Test Infrastructure? Infrastructure code has the same problems as application code: Typos break things Logic errors cause outages Refactoring introduces regressions “It works on my machine” applies to terraform too The difference: infrastructure mistakes often cost more. A bad deployment can take down production, corrupt data, or rack up cloud bills. ...

March 12, 2026 Â· 7 min Â· 1303 words Â· Rob Washington

Terraform Patterns That Scale

Terraform’s getting-started guide shows a single main.tf with everything in it. That works for demos. It doesn’t work when you have 50 resources, 5 environments, and a team making changes simultaneously. These patterns emerge from scaling Terraform across teams and environments—where state conflicts happen, where modules get copied instead of shared, and where “just run terraform apply” becomes terrifying. Project Structure The flat-file approach breaks down fast. Structure by environment and component: ...

March 11, 2026 Â· 10 min Â· 1951 words Â· Rob Washington

The Art of Idempotent Automation

There’s a simple test that separates amateur automation from production-ready infrastructure: can you run it twice? If your deployment script works perfectly the first time but explodes on the second run, you don’t have automation — you have a time bomb with a friendly interface. What Idempotency Actually Means An operation is idempotent if performing it multiple times produces the same result as performing it once. In practical terms: 1 2 3 4 5 # Idempotent: always results in nginx being installed apt install nginx # NOT idempotent: appends every time echo "export PATH=/opt/bin:$PATH" >> ~/.bashrc The first command checks state before acting. The second blindly mutates. ...

March 7, 2026 Â· 4 min Â· 817 words Â· Rob Washington

Infrastructure as Code for AI Workloads: Scaling Smart

As AI workloads become central to business operations, managing the infrastructure that powers them requires the same rigor we apply to traditional applications. Infrastructure as Code (IaC) isn’t just nice-to-have for AI—it’s essential for cost control, reproducibility, and scaling. The AI Infrastructure Challenge AI workloads have unique requirements that traditional IaC patterns don’t always address: GPU instances that cost $3-10/hour and need careful lifecycle management Model artifacts that can be gigabytes in size and need versioning Auto-scaling that must consider both compute load and model warming time Spot instance strategies to reduce costs by 60-90% Let’s build a Terraform + Ansible solution that handles these challenges. ...

March 6, 2026 Â· 5 min Â· 1014 words Â· Rob Washington

Terraform Basics: Infrastructure as Code

Clicking through cloud consoles doesn’t scale. Terraform lets you define infrastructure in code, track changes in git, and deploy the same environment repeatedly. Core Concepts Provider: Plugin for a platform (AWS, GCP, Azure, etc.) Resource: A thing to create (server, database, DNS record) State: Terraform’s record of what exists Plan: Preview of changes before applying Apply: Make the changes happen Basic Workflow 1 2 3 4 terraform init # Download providers terraform plan # Preview changes terraform apply # Create/update resources terraform destroy # Tear everything down First Configuration 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 # main.tf # Configure the AWS provider terraform { required_providers { aws = { source = "hashicorp/aws" version = "~> 5.0" } } } provider "aws" { region = "us-east-1" } # Create an EC2 instance resource "aws_instance" "web" { ami = "ami-0c55b159cbfafe1f0" instance_type = "t3.micro" tags = { Name = "WebServer" } } 1 2 3 terraform init # Downloads AWS provider terraform plan # Shows: 1 to add terraform apply # Creates the instance Variables 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 # variables.tf variable "environment" { description = "Deployment environment" type = string default = "dev" } variable "instance_type" { description = "EC2 instance type" type = string default = "t3.micro" } variable "allowed_ips" { description = "IPs allowed to SSH" type = list(string) default = ["0.0.0.0/0"] } # main.tf - use variables resource "aws_instance" "web" { ami = "ami-0c55b159cbfafe1f0" instance_type = var.instance_type tags = { Name = "web-${var.environment}" Environment = var.environment } } Setting Variables 1 2 3 4 5 6 7 8 9 # Command line terraform apply -var="environment=prod" # File (terraform.tfvars) environment = "prod" instance_type = "t3.small" # Environment variables export TF_VAR_environment="prod" Outputs 1 2 3 4 5 6 7 8 9 10 # outputs.tf output "instance_ip" { description = "Public IP of the instance" value = aws_instance.web.public_ip } output "instance_id" { description = "Instance ID" value = aws_instance.web.id } After apply: ...

March 5, 2026 Â· 7 min Â· 1424 words Â· Rob Washington

Infrastructure as Code with Terraform: A Practical Guide

Clicking through cloud consoles doesn’t scale. Infrastructure as Code (IaC) lets you version, review, and automate your infrastructure just like application code. Terraform has become the de facto standard. Here’s how to use it effectively. The Basics Terraform uses HCL (HashiCorp Configuration Language) to declare resources: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 # main.tf terraform { required_providers { aws = { source = "hashicorp/aws" version = "~> 5.0" } } } provider "aws" { region = "us-east-1" } resource "aws_instance" "web" { ami = "ami-0c55b159cbfafe1f0" instance_type = "t3.micro" tags = { Name = "web-server" } } 1 2 3 4 terraform init # Download providers terraform plan # Preview changes terraform apply # Create resources terraform destroy # Tear down everything State Management Terraform tracks what it created in a state file. Never lose this file. ...

March 4, 2026 Â· 8 min Â· 1505 words Â· Rob Washington

Terraform State Management: Keep Your Infrastructure Sane

Terraform state is the source of truth for your infrastructure. Mess it up and you’ll be manually reconciling resources at 2 AM. Here’s how to manage state properly from day one. What Is State? Terraform state maps your configuration to real resources: 1 2 3 4 5 # main.tf resource "aws_instance" "web" { ami = "ami-12345" instance_type = "t3.micro" } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 // terraform.tfstate (simplified) { "resources": [{ "type": "aws_instance", "name": "web", "instances": [{ "attributes": { "id": "i-0abc123def456", "ami": "ami-12345", "instance_type": "t3.micro" } }] }] } Without state, Terraform doesn’t know aws_instance.web corresponds to i-0abc123def456. It would try to create a new instance every time. ...

March 1, 2026 Â· 7 min Â· 1336 words Â· Rob Washington

Terraform Module Patterns for Reusable Infrastructure

Terraform modules turn infrastructure code from scripts into libraries. Here’s how to design them well. Module Structure m └ o ─ d ─ u l v ├ ├ ├ ├ └ e p ─ ─ ─ ─ ─ s c ─ ─ ─ ─ ─ / / m v o v R a a u e E i r t r A n i p s D . a u i M t b t o E f l s n . e . s m s t . d . f t t f f # # # # # R I O P D e n u r o s p t o c o u p v u u t u i m r t d e c v e n e a v r t s r a a i l r t a u e i b e q o l s u n e i s r e m e n t s Basic Module variables.tf 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 variable "name" { description = "Name prefix for resources" type = string } variable "cidr_block" { description = "CIDR block for VPC" type = string default = "10.0.0.0/16" } variable "azs" { description = "Availability zones" type = list(string) } variable "private_subnets" { description = "Private subnet CIDR blocks" type = list(string) default = [] } variable "public_subnets" { description = "Public subnet CIDR blocks" type = list(string) default = [] } variable "tags" { description = "Tags to apply to resources" type = map(string) default = {} } main.tf 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 resource "aws_vpc" "this" { cidr_block = var.cidr_block enable_dns_hostnames = true enable_dns_support = true tags = merge(var.tags, { Name = var.name }) } resource "aws_subnet" "private" { count = length(var.private_subnets) vpc_id = aws_vpc.this.id cidr_block = var.private_subnets[count.index] availability_zone = var.azs[count.index % length(var.azs)] tags = merge(var.tags, { Name = "${var.name}-private-${count.index + 1}" Type = "private" }) } resource "aws_subnet" "public" { count = length(var.public_subnets) vpc_id = aws_vpc.this.id cidr_block = var.public_subnets[count.index] availability_zone = var.azs[count.index % length(var.azs)] map_public_ip_on_launch = true tags = merge(var.tags, { Name = "${var.name}-public-${count.index + 1}" Type = "public" }) } outputs.tf 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 output "vpc_id" { description = "VPC ID" value = aws_vpc.this.id } output "private_subnet_ids" { description = "Private subnet IDs" value = aws_subnet.private[*].id } output "public_subnet_ids" { description = "Public subnet IDs" value = aws_subnet.public[*].id } output "vpc_cidr_block" { description = "VPC CIDR block" value = aws_vpc.this.cidr_block } versions.tf 1 2 3 4 5 6 7 8 9 10 terraform { required_version = ">= 1.0" required_providers { aws = { source = "hashicorp/aws" version = ">= 4.0" } } } Using Modules Local Module 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 module "vpc" { source = "./modules/vpc" name = "production" cidr_block = "10.0.0.0/16" azs = ["us-east-1a", "us-east-1b", "us-east-1c"] private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"] public_subnets = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"] tags = { Environment = "production" Project = "myapp" } } # Use outputs resource "aws_instance" "app" { subnet_id = module.vpc.private_subnet_ids[0] # ... } Registry Module 1 2 3 4 5 6 7 8 9 10 11 12 13 14 module "vpc" { source = "terraform-aws-modules/vpc/aws" version = "5.0.0" name = "my-vpc" cidr = "10.0.0.0/16" azs = ["us-east-1a", "us-east-1b"] private_subnets = ["10.0.1.0/24", "10.0.2.0/24"] public_subnets = ["10.0.101.0/24", "10.0.102.0/24"] enable_nat_gateway = true single_nat_gateway = true } Git Module 1 2 3 4 module "vpc" { source = "git::https://github.com/org/terraform-modules.git//vpc?ref=v1.2.0" # ... } Variable Validation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 variable "environment" { description = "Environment name" type = string validation { condition = contains(["dev", "staging", "production"], var.environment) error_message = "Environment must be dev, staging, or production." } } variable "instance_type" { description = "EC2 instance type" type = string default = "t3.micro" validation { condition = can(regex("^t3\\.", var.instance_type)) error_message = "Instance type must be t3 family." } } variable "cidr_block" { description = "CIDR block" type = string validation { condition = can(cidrhost(var.cidr_block, 0)) error_message = "Must be a valid CIDR block." } } Complex Types 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 variable "services" { description = "Service configurations" type = list(object({ name = string port = number health_path = optional(string, "/health") replicas = optional(number, 1) environment = optional(map(string), {}) })) } # Usage services = [ { name = "api" port = 8080 replicas = 3 environment = { LOG_LEVEL = "info" } }, { name = "worker" port = 9090 } ] Conditional Resources 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 variable "create_nat_gateway" { description = "Create NAT gateway" type = bool default = true } resource "aws_nat_gateway" "this" { count = var.create_nat_gateway ? 1 : 0 allocation_id = aws_eip.nat[0].id subnet_id = aws_subnet.public[0].id } resource "aws_eip" "nat" { count = var.create_nat_gateway ? 1 : 0 domain = "vpc" } Dynamic Blocks 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 variable "ingress_rules" { description = "Ingress rules" type = list(object({ port = number protocol = string cidr_blocks = list(string) })) default = [] } resource "aws_security_group" "this" { name = var.name description = var.description vpc_id = var.vpc_id dynamic "ingress" { for_each = var.ingress_rules content { from_port = ingress.value.port to_port = ingress.value.port protocol = ingress.value.protocol cidr_blocks = ingress.value.cidr_blocks } } egress { from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] } } Module Composition Root Module 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 # main.tf module "vpc" { source = "./modules/vpc" # ... } module "security_groups" { source = "./modules/security-groups" vpc_id = module.vpc.vpc_id # ... } module "ecs_cluster" { source = "./modules/ecs-cluster" vpc_id = module.vpc.vpc_id private_subnet_ids = module.vpc.private_subnet_ids security_group_ids = [module.security_groups.app_sg_id] # ... } Shared Data 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # data.tf - Shared data sources data "aws_caller_identity" "current" {} data "aws_region" "current" {} locals { account_id = data.aws_caller_identity.current.account_id region = data.aws_region.current.name common_tags = { Environment = var.environment Project = var.project ManagedBy = "terraform" } } For_each vs Count Count (Index-based) 1 2 3 4 5 # Problematic: Adding/removing items shifts indices resource "aws_subnet" "private" { count = length(var.private_subnets) cidr_block = var.private_subnets[count.index] } For_each (Key-based) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 # Better: Keys are stable variable "subnets" { type = map(object({ cidr_block = string az = string })) } resource "aws_subnet" "private" { for_each = var.subnets cidr_block = each.value.cidr_block availability_zone = each.value.az tags = { Name = each.key } } # Usage subnets = { "private-1a" = { cidr_block = "10.0.1.0/24", az = "us-east-1a" } "private-1b" = { cidr_block = "10.0.2.0/24", az = "us-east-1b" } } Sensitive Values 1 2 3 4 5 6 7 8 9 10 11 variable "database_password" { description = "Database password" type = string sensitive = true } output "connection_string" { description = "Database connection string" value = "postgres://user:${var.database_password}@${aws_db_instance.this.endpoint}/db" sensitive = true } Module Testing Terraform Test (1.6+) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 # tests/vpc_test.tftest.hcl run "vpc_creates_successfully" { command = plan variables { name = "test-vpc" cidr_block = "10.0.0.0/16" azs = ["us-east-1a"] } assert { condition = aws_vpc.this.cidr_block == "10.0.0.0/16" error_message = "VPC CIDR block incorrect" } } Run tests: ...

February 28, 2026 Â· 8 min Â· 1602 words Â· Rob Washington