Cloud Cost Optimization: Stop Burning Money on AWS

Your AWS bill is too high. Everyone’s is. The cloud makes it trivially easy to spin up resources and surprisingly hard to know what you’re actually paying for.

Here’s how to stop the bleeding.

The Low-Hanging Fruit

Unused Resources

The easiest savings come from things you’re not using.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Find unattached EBS volumes
aws ec2 describe-volumes \
  --filters Name=status,Values=available \
  --query 'Volumes[*].[VolumeId,Size,CreateTime]' \
  --output table

# Find unused Elastic IPs
aws ec2 describe-addresses \
  --query 'Addresses[?AssociationId==`null`].[PublicIp,AllocationId]' \
  --output table

# Find idle load balancers (no healthy targets)
aws elbv2 describe-target-health \
  --query 'TargetHealthDescriptions[?TargetHealth.State!=`healthy`]'

Run these monthly. You’ll find forgotten resources every time.

Right-Sizing

Most instances are oversized. Check actual utilization:

1
2
3
4
5
6
7
8
9
# Get CPU utilization for all EC2 instances
aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
  --start-time $(date -d '7 days ago' -u +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
  --period 3600 \
  --statistics Average Maximum

If average CPU is under 20% and max is under 50%, downsize.

AWS Compute Optimizer does this automatically:

1
2
aws compute-optimizer get-ec2-instance-recommendations \
  --query 'instanceRecommendations[*].[instanceArn,currentInstanceType,recommendationOptions[0].instanceType]'

Reserved Instances and Savings Plans

On-demand pricing is the most expensive option. For steady-state workloads:

Savings Plans: Commit to $/hour, flexible across instance types. Start here.
Reserved Instances: Commit to specific instance type, bigger discount.
Spot Instances: Up to 90% off, but can be terminated. Good for batch jobs.

1
2
3
4
# Check current RI coverage
aws ce get-reservation-coverage \
  --time-period Start=$(date -d '30 days ago' +%Y-%m-%d),End=$(date +%Y-%m-%d) \
  --group-by Type=DIMENSION,Key=SERVICE

Target 70-80% coverage on steady workloads. Don’t over-commit.

Storage Optimization

S3 Lifecycle Policies

Most S3 data is accessed once and forgotten. Tier it automatically:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
{
  "Rules": [
    {
      "ID": "Archive old data",
      "Status": "Enabled",
      "Filter": {"Prefix": "logs/"},
      "Transitions": [
        {"Days": 30, "StorageClass": "STANDARD_IA"},
        {"Days": 90, "StorageClass": "GLACIER"},
        {"Days": 365, "StorageClass": "DEEP_ARCHIVE"}
      ],
      "Expiration": {"Days": 730}
    }
  ]
}

Storage class costs:

Standard: $0.023/GB
Standard-IA: $0.0125/GB (50% cheaper)
Glacier: $0.004/GB (80% cheaper)
Deep Archive: $0.00099/GB (96% cheaper)

S3 Intelligent-Tiering

If you can’t predict access patterns, let AWS figure it out:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
aws s3api put-bucket-intelligent-tiering-configuration \
  --bucket my-bucket \
  --id "AutoTiering" \
  --intelligent-tiering-configuration '{
    "Id": "AutoTiering",
    "Status": "Enabled",
    "Tierings": [
      {"Days": 90, "AccessTier": "ARCHIVE_ACCESS"},
      {"Days": 180, "AccessTier": "DEEP_ARCHIVE_ACCESS"}
    ]
  }'

Small monitoring fee, but no retrieval charges for frequently accessed data.

EBS Optimization

gp3 vs gp2: gp3 is 20% cheaper with better baseline performance. Migrate everything.

1
2
3
4
5
6
7
8
# Find gp2 volumes to migrate
aws ec2 describe-volumes \
  --filters Name=volume-type,Values=gp2 \
  --query 'Volumes[*].[VolumeId,Size,Iops]' \
  --output table

# Modify to gp3
aws ec2 modify-volume --volume-id vol-xxx --volume-type gp3

Snapshots: Delete old ones. They accumulate silently.

1
2
3
# Find snapshots older than 90 days
aws ec2 describe-snapshots --owner-ids self \
  --query "Snapshots[?StartTime<='$(date -d '90 days ago' +%Y-%m-%d)'].[SnapshotId,StartTime,VolumeSize]"

Compute Patterns

Spot for Batch Workloads

Spot instances are spare capacity at steep discounts. Perfect for:

CI/CD runners
Data processing
Batch jobs
Dev/test environments

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# EKS node group with Spot
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: my-cluster
managedNodeGroups:
  - name: spot-workers
    instanceTypes: ["m5.large", "m5a.large", "m4.large"]
    spot: true
    minSize: 2
    maxSize: 10

Mix instance types for better availability.

Auto Scaling That Actually Scales Down

The default scaling policies are aggressive scaling up, timid scaling down.

1
2
3
4
5
6
7
# Scale down faster
ScaleDownPolicy:
  PolicyType: StepScaling
  StepAdjustments:
    - MetricIntervalUpperBound: 0
      ScalingAdjustment: -2  # Remove 2 instances at a time
  Cooldown: 60  # Check again in 60 seconds

Also consider scheduled scaling for predictable patterns:

1
2
3
4
5
aws autoscaling put-scheduled-action \
  --auto-scaling-group-name my-asg \
  --scheduled-action-name "scale-down-night" \
  --recurrence "0 22 * * *" \
  --desired-capacity 2

Lambda Optimization

Memory = CPU: Higher memory means faster execution. Sometimes 256MB for 1 second costs more than 512MB for 400ms.

1
2
3
4
5
# Test different memory sizes
for memory in [128, 256, 512, 1024]:
    duration = benchmark_function(memory)
    cost = (memory / 1024) * duration * 0.0000166667
    print(f"{memory}MB: {duration}ms, ${cost:.6f}")

Provisioned Concurrency: Eliminates cold starts but costs money. Use only for latency-critical paths.

Database Costs

RDS Right-Sizing

Databases are often the biggest line item. Check:

1
2
3
4
5
6
7
8
9
# CPU utilization
aws cloudwatch get-metric-statistics \
  --namespace AWS/RDS \
  --metric-name CPUUtilization \
  --dimensions Name=DBInstanceIdentifier,Value=mydb \
  --start-time $(date -d '7 days ago' -u +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
  --period 3600 \
  --statistics Average Maximum

Consider:

Aurora Serverless v2: Scales to zero, pay for what you use
Graviton instances: 20% cheaper, often faster
Reserved instances: 30-60% off for 1-3 year commits

Read Replicas vs Bigger Instance

Adding read replicas is often cheaper than scaling up:

db.r5.xlarge: $0.48/hour
db.r5.2xlarge: $0.96/hour
2x db.r5.xlarge (primary + replica): $0.96/hour but 2x read capacity

ElastiCache and DynamoDB

ElastiCache: Reserved nodes are 30-50% cheaper than on-demand.

DynamoDB:

On-demand mode for unpredictable traffic
Provisioned + auto-scaling for steady workloads
Reserved capacity for predictable high volume

Visibility

Cost Allocation Tags

You can’t optimize what you can’t see. Tag everything:

1
2
3
aws ec2 create-tags \
  --resources i-1234567890abcdef0 \
  --tags Key=Environment,Value=production Key=Team,Value=platform Key=Project,Value=api

Enable tags in Cost Explorer for per-team/per-project breakdowns.

Budgets and Alerts

Know before the bill arrives:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
aws budgets create-budget \
  --account-id 123456789012 \
  --budget '{
    "BudgetName": "Monthly Total",
    "BudgetLimit": {"Amount": "5000", "Unit": "USD"},
    "TimeUnit": "MONTHLY",
    "BudgetType": "COST"
  }' \
  --notifications-with-subscribers '[{
    "Notification": {
      "NotificationType": "ACTUAL",
      "ComparisonOperator": "GREATER_THAN",
      "Threshold": 80
    },
    "Subscribers": [{"SubscriptionType": "EMAIL", "Address": "team@example.com"}]
  }]'

Set alerts at 50%, 80%, 100% of budget.

Cost Anomaly Detection

AWS can alert on unusual spending:

1
2
3
4
5
6
aws ce create-anomaly-monitor \
  --anomaly-monitor '{
    "MonitorName": "ServiceMonitor",
    "MonitorType": "DIMENSIONAL",
    "MonitorDimension": "SERVICE"
  }'

Catches runaway resources before they become massive bills.

Quick Wins Checklist

Delete unattached EBS volumes
Release unused Elastic IPs
Migrate gp2 → gp3
Enable S3 lifecycle policies
Right-size instances (Compute Optimizer)
Use Spot for dev/test/batch
Set up billing alerts
Tag everything for visibility
Review reserved instance coverage

Do these this week. You’ll likely save 20-40%.

The cloud is only expensive if you’re not paying attention. Pay attention.

The Low-Hanging Fruit#

Unused Resources#

Right-Sizing#

Reserved Instances and Savings Plans#

Storage Optimization#

S3 Lifecycle Policies#

S3 Intelligent-Tiering#

EBS Optimization#

Compute Patterns#

Spot for Batch Workloads#

Auto Scaling That Actually Scales Down#

Lambda Optimization#

Database Costs#

RDS Right-Sizing#

Read Replicas vs Bigger Instance#

ElastiCache and DynamoDB#

Visibility#

Cost Allocation Tags#

Budgets and Alerts#

Cost Anomaly Detection#

Quick Wins Checklist#

📬 Get the Newsletter