Cloud Cost Optimization: Stop Burning Money

Your cloud bill is too high. It always is. Here’s how to actually reduce it without breaking things.

Quick Wins

1. Find Unused Resources

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# AWS: Find unattached EBS volumes
aws ec2 describe-volumes \
  --filters Name=status,Values=available \
  --query 'Volumes[*].[VolumeId,Size,CreateTime]' \
  --output table

# Find old snapshots (>90 days)
aws ec2 describe-snapshots --owner-ids self \
  --query 'Snapshots[?StartTime<=`2026-01-01`].[SnapshotId,VolumeSize,StartTime]' \
  --output table

# Find unattached Elastic IPs
aws ec2 describe-addresses \
  --query 'Addresses[?AssociationId==`null`].[PublicIp,AllocationId]' \
  --output table

Delete them. Unattached EBS volumes cost money. Unused EIPs cost $3.65/month each.

2. Right-Size Instances

1
2
3
4
5
6
7
8
9
# Check CPU utilization
aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
  --start-time 2026-03-01T00:00:00Z \
  --end-time 2026-03-11T00:00:00Z \
  --period 86400 \
  --statistics Average

If average CPU < 20%, downsize. A t3.large at 10% CPU should be a t3.small.

3. Use Spot Instances

For fault-tolerant workloads:

1
2
3
4
5
6
7
8
9
# Terraform
resource "aws_spot_instance_request" "worker" {
  ami           = "ami-12345678"
  instance_type = "c5.xlarge"
  spot_price    = "0.10"  # Max you'll pay
  
  # Spot instances can be interrupted
  instance_interruption_behavior = "terminate"
}

Savings: 60-90% vs on-demand.

4. Reserved Instances / Savings Plans

For predictable workloads:

Commitment	Discount
No commitment	0%
1 year, no upfront	~30%
1 year, all upfront	~40%
3 year, all upfront	~60%

1
2
3
4
# Check reservation coverage
aws ce get-reservation-coverage \
  --time-period Start=2026-03-01,End=2026-03-11 \
  --group-by Type=DIMENSION,Key=SERVICE

Storage Optimization

S3 Lifecycle Policies

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
{
  "Rules": [
    {
      "ID": "MoveToIA",
      "Status": "Enabled",
      "Filter": {"Prefix": "logs/"},
      "Transitions": [
        {"Days": 30, "StorageClass": "STANDARD_IA"},
        {"Days": 90, "StorageClass": "GLACIER"},
        {"Days": 365, "StorageClass": "DEEP_ARCHIVE"}
      ],
      "Expiration": {"Days": 730}
    }
  ]
}

Storage Class	Cost (per GB/month)	Use Case
Standard	$0.023	Frequent access
Standard-IA	$0.0125	Infrequent access
Glacier	$0.004	Archive (minutes retrieval)
Deep Archive	$0.00099	Long-term archive (hours)

S3 Intelligent Tiering

Let AWS optimize for you:

1
2
aws s3 cp myfile.txt s3://mybucket/ \
  --storage-class INTELLIGENT_TIERING

Automatically moves objects between tiers based on access patterns.

EBS Optimization

1
2
3
4
5
6
7
# Find over-provisioned volumes
aws ec2 describe-volumes \
  --query 'Volumes[?Size>`100`].[VolumeId,Size,VolumeType,Iops]'

# gp3 is usually cheaper than gp2
# gp2: $0.10/GB + IOPS scale with size
# gp3: $0.08/GB + $0.005/IOPS (configurable)

Migrate gp2 to gp3:

1
2
3
4
5
aws ec2 modify-volume \
  --volume-id vol-1234567890abcdef0 \
  --volume-type gp3 \
  --iops 3000 \
  --throughput 125

Compute Optimization

Auto Scaling

Scale down when you don’t need capacity:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# Terraform
resource "aws_autoscaling_schedule" "scale_down_night" {
  scheduled_action_name  = "scale-down-night"
  autoscaling_group_name = aws_autoscaling_group.app.name
  min_size               = 1
  max_size               = 2
  desired_capacity       = 1
  recurrence             = "0 22 * * *"  # 10 PM
}

resource "aws_autoscaling_schedule" "scale_up_morning" {
  scheduled_action_name  = "scale-up-morning"
  autoscaling_group_name = aws_autoscaling_group.app.name
  min_size               = 2
  max_size               = 10
  desired_capacity       = 4
  recurrence             = "0 6 * * 1-5"  # 6 AM weekdays
}

Lambda Right-Sizing

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# Check if you're over-provisioned
import boto3

client = boto3.client('cloudwatch')

response = client.get_metric_statistics(
    Namespace='AWS/Lambda',
    MetricName='Duration',
    Dimensions=[{'Name': 'FunctionName', 'Value': 'my-function'}],
    StartTime='2026-03-01',
    EndTime='2026-03-11',
    Period=86400,
    Statistics=['Average', 'Maximum']
)

# If max duration << timeout, reduce memory
# Lambda CPU scales with memory

Use AWS Lambda Power Tuning to find optimal memory:

1
# https://github.com/alexcasalboni/aws-lambda-power-tuning

Container Optimization

1
2
3
4
5
6
7
8
# Set resource limits to avoid over-provisioning
resources:
  requests:
    memory: "256Mi"
    cpu: "250m"
  limits:
    memory: "512Mi"
    cpu: "500m"

Use Karpenter for Kubernetes node optimization:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: default
spec:
  requirements:
    - key: karpenter.sh/capacity-type
      operator: In
      values: ["spot", "on-demand"]  # Prefer spot
    - key: node.kubernetes.io/instance-type
      operator: In
      values: ["t3.medium", "t3.large", "t3.xlarge"]

Database Optimization

RDS Right-Sizing

1
2
3
4
5
6
7
-- Check if you need that big instance
SHOW STATUS LIKE 'Max_used_connections';
-- If max << max_connections, downsize

-- Check buffer pool usage (MySQL)
SHOW STATUS LIKE 'Innodb_buffer_pool%';
-- If pages_free is high, reduce instance size

Aurora Serverless v2

Pay for what you use:

1
2
3
4
5
6
7
8
9
resource "aws_rds_cluster" "aurora" {
  engine         = "aurora-postgresql"
  engine_mode    = "provisioned"
  
  serverlessv2_scaling_configuration {
    min_capacity = 0.5  # Scale to near-zero
    max_capacity = 16
  }
}

Read Replicas

Offload reads to cheaper replicas:

1
2
3
4
5
6
7
# Write to primary
write_db = connect(primary_endpoint)
write_db.execute("INSERT INTO users ...")

# Read from replica
read_db = connect(replica_endpoint)
users = read_db.execute("SELECT * FROM users")

Monitoring Costs

AWS Cost Explorer API

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
import boto3

client = boto3.client('ce')

response = client.get_cost_and_usage(
    TimePeriod={
        'Start': '2026-03-01',
        'End': '2026-03-11'
    },
    Granularity='DAILY',
    Metrics=['UnblendedCost'],
    GroupBy=[
        {'Type': 'DIMENSION', 'Key': 'SERVICE'}
    ]
)

for day in response['ResultsByTime']:
    print(f"{day['TimePeriod']['Start']}:")
    for group in day['Groups']:
        service = group['Keys'][0]
        cost = group['Metrics']['UnblendedCost']['Amount']
        print(f"  {service}: ${float(cost):.2f}")

Budget Alerts

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# Terraform
resource "aws_budgets_budget" "monthly" {
  name         = "monthly-budget"
  budget_type  = "COST"
  limit_amount = "1000"
  limit_unit   = "USD"
  time_unit    = "MONTHLY"

  notification {
    comparison_operator = "GREATER_THAN"
    threshold           = 80
    threshold_type      = "PERCENTAGE"
    notification_type   = "ACTUAL"
    subscriber_email_addresses = ["alerts@example.com"]
  }
}

Cost Allocation Tags

Tag everything:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
resource "aws_instance" "web" {
  # ...
  
  tags = {
    Environment = "production"
    Team        = "platform"
    Project     = "api"
    CostCenter  = "engineering"
  }
}

Then filter costs by tag:

1
2
3
4
5
aws ce get-cost-and-usage \
  --time-period Start=2026-03-01,End=2026-03-11 \
  --granularity MONTHLY \
  --metrics UnblendedCost \
  --group-by Type=TAG,Key=Project

The Cost Optimization Checklist

Weekly:

Review unused resources
Check for unattached volumes/EIPs
Review instance utilization

Monthly:

Right-size underutilized instances
Review Reserved Instance coverage
Check S3 storage class distribution
Analyze cost by team/project

Quarterly:

Renegotiate Reserved Instances
Review architecture for optimization
Evaluate new instance types
Clean up old snapshots/AMIs

Quick Reference: Savings by Action

Action	Typical Savings
Delete unused resources	5-15%
Right-size instances	10-30%
Reserved Instances (1yr)	30-40%
Spot Instances	60-90%
S3 lifecycle policies	40-70% on storage
Scheduled scaling	20-40%
gp2 → gp3 migration	20% on EBS

Start with the quick wins. The biggest savings usually come from things you’re not using at all.

Cloud costs are like subscriptions — they accumulate quietly until you look at the bill. Look at the bill regularly.

Quick Wins#

1. Find Unused Resources#

2. Right-Size Instances#

3. Use Spot Instances#

4. Reserved Instances / Savings Plans#

Storage Optimization#

S3 Lifecycle Policies#

S3 Intelligent Tiering#

EBS Optimization#

Compute Optimization#

Auto Scaling#

Lambda Right-Sizing#

Container Optimization#

Database Optimization#

RDS Right-Sizing#

Aurora Serverless v2#

Read Replicas#

Monitoring Costs#

AWS Cost Explorer API#

Budget Alerts#

Cost Allocation Tags#

The Cost Optimization Checklist#

Quick Reference: Savings by Action#

📬 Get the Newsletter