Your AWS bill is too high. Everyone’s is. The cloud makes it trivially easy to spin up resources and surprisingly hard to know what you’re actually paying for.
Here’s how to stop the bleeding.
The Low-Hanging Fruit#
Unused Resources#
The easiest savings come from things you’re not using.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| # Find unattached EBS volumes
aws ec2 describe-volumes \
--filters Name=status,Values=available \
--query 'Volumes[*].[VolumeId,Size,CreateTime]' \
--output table
# Find unused Elastic IPs
aws ec2 describe-addresses \
--query 'Addresses[?AssociationId==`null`].[PublicIp,AllocationId]' \
--output table
# Find idle load balancers (no healthy targets)
aws elbv2 describe-target-health \
--query 'TargetHealthDescriptions[?TargetHealth.State!=`healthy`]'
|
Run these monthly. You’ll find forgotten resources every time.
Right-Sizing#
Most instances are oversized. Check actual utilization:
1
2
3
4
5
6
7
8
9
| # Get CPU utilization for all EC2 instances
aws cloudwatch get-metric-statistics \
--namespace AWS/EC2 \
--metric-name CPUUtilization \
--dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
--start-time $(date -d '7 days ago' -u +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
--period 3600 \
--statistics Average Maximum
|
If average CPU is under 20% and max is under 50%, downsize.
AWS Compute Optimizer does this automatically:
1
2
| aws compute-optimizer get-ec2-instance-recommendations \
--query 'instanceRecommendations[*].[instanceArn,currentInstanceType,recommendationOptions[0].instanceType]'
|
Reserved Instances and Savings Plans#
On-demand pricing is the most expensive option. For steady-state workloads:
- Savings Plans: Commit to $/hour, flexible across instance types. Start here.
- Reserved Instances: Commit to specific instance type, bigger discount.
- Spot Instances: Up to 90% off, but can be terminated. Good for batch jobs.
1
2
3
4
| # Check current RI coverage
aws ce get-reservation-coverage \
--time-period Start=$(date -d '30 days ago' +%Y-%m-%d),End=$(date +%Y-%m-%d) \
--group-by Type=DIMENSION,Key=SERVICE
|
Target 70-80% coverage on steady workloads. Don’t over-commit.
Storage Optimization#
S3 Lifecycle Policies#
Most S3 data is accessed once and forgotten. Tier it automatically:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| {
"Rules": [
{
"ID": "Archive old data",
"Status": "Enabled",
"Filter": {"Prefix": "logs/"},
"Transitions": [
{"Days": 30, "StorageClass": "STANDARD_IA"},
{"Days": 90, "StorageClass": "GLACIER"},
{"Days": 365, "StorageClass": "DEEP_ARCHIVE"}
],
"Expiration": {"Days": 730}
}
]
}
|
Storage class costs:
- Standard: $0.023/GB
- Standard-IA: $0.0125/GB (50% cheaper)
- Glacier: $0.004/GB (80% cheaper)
- Deep Archive: $0.00099/GB (96% cheaper)
S3 Intelligent-Tiering#
If you can’t predict access patterns, let AWS figure it out:
1
2
3
4
5
6
7
8
9
10
11
| aws s3api put-bucket-intelligent-tiering-configuration \
--bucket my-bucket \
--id "AutoTiering" \
--intelligent-tiering-configuration '{
"Id": "AutoTiering",
"Status": "Enabled",
"Tierings": [
{"Days": 90, "AccessTier": "ARCHIVE_ACCESS"},
{"Days": 180, "AccessTier": "DEEP_ARCHIVE_ACCESS"}
]
}'
|
Small monitoring fee, but no retrieval charges for frequently accessed data.
EBS Optimization#
gp3 vs gp2: gp3 is 20% cheaper with better baseline performance. Migrate everything.
1
2
3
4
5
6
7
8
| # Find gp2 volumes to migrate
aws ec2 describe-volumes \
--filters Name=volume-type,Values=gp2 \
--query 'Volumes[*].[VolumeId,Size,Iops]' \
--output table
# Modify to gp3
aws ec2 modify-volume --volume-id vol-xxx --volume-type gp3
|
Snapshots: Delete old ones. They accumulate silently.
1
2
3
| # Find snapshots older than 90 days
aws ec2 describe-snapshots --owner-ids self \
--query "Snapshots[?StartTime<='$(date -d '90 days ago' +%Y-%m-%d)'].[SnapshotId,StartTime,VolumeSize]"
|
Compute Patterns#
Spot for Batch Workloads#
Spot instances are spare capacity at steep discounts. Perfect for:
- CI/CD runners
- Data processing
- Batch jobs
- Dev/test environments
1
2
3
4
5
6
7
8
9
10
11
| # EKS node group with Spot
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: my-cluster
managedNodeGroups:
- name: spot-workers
instanceTypes: ["m5.large", "m5a.large", "m4.large"]
spot: true
minSize: 2
maxSize: 10
|
Mix instance types for better availability.
Auto Scaling That Actually Scales Down#
The default scaling policies are aggressive scaling up, timid scaling down.
1
2
3
4
5
6
7
| # Scale down faster
ScaleDownPolicy:
PolicyType: StepScaling
StepAdjustments:
- MetricIntervalUpperBound: 0
ScalingAdjustment: -2 # Remove 2 instances at a time
Cooldown: 60 # Check again in 60 seconds
|
Also consider scheduled scaling for predictable patterns:
1
2
3
4
5
| aws autoscaling put-scheduled-action \
--auto-scaling-group-name my-asg \
--scheduled-action-name "scale-down-night" \
--recurrence "0 22 * * *" \
--desired-capacity 2
|
Lambda Optimization#
Memory = CPU: Higher memory means faster execution. Sometimes 256MB for 1 second costs more than 512MB for 400ms.
1
2
3
4
5
| # Test different memory sizes
for memory in [128, 256, 512, 1024]:
duration = benchmark_function(memory)
cost = (memory / 1024) * duration * 0.0000166667
print(f"{memory}MB: {duration}ms, ${cost:.6f}")
|
Provisioned Concurrency: Eliminates cold starts but costs money. Use only for latency-critical paths.
Database Costs#
RDS Right-Sizing#
Databases are often the biggest line item. Check:
1
2
3
4
5
6
7
8
9
| # CPU utilization
aws cloudwatch get-metric-statistics \
--namespace AWS/RDS \
--metric-name CPUUtilization \
--dimensions Name=DBInstanceIdentifier,Value=mydb \
--start-time $(date -d '7 days ago' -u +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
--period 3600 \
--statistics Average Maximum
|
Consider:
- Aurora Serverless v2: Scales to zero, pay for what you use
- Graviton instances: 20% cheaper, often faster
- Reserved instances: 30-60% off for 1-3 year commits
Read Replicas vs Bigger Instance#
Adding read replicas is often cheaper than scaling up:
- db.r5.xlarge: $0.48/hour
- db.r5.2xlarge: $0.96/hour
- 2x db.r5.xlarge (primary + replica): $0.96/hour but 2x read capacity
ElastiCache and DynamoDB#
ElastiCache: Reserved nodes are 30-50% cheaper than on-demand.
DynamoDB:
- On-demand mode for unpredictable traffic
- Provisioned + auto-scaling for steady workloads
- Reserved capacity for predictable high volume
Visibility#
You can’t optimize what you can’t see. Tag everything:
1
2
3
| aws ec2 create-tags \
--resources i-1234567890abcdef0 \
--tags Key=Environment,Value=production Key=Team,Value=platform Key=Project,Value=api
|
Enable tags in Cost Explorer for per-team/per-project breakdowns.
Budgets and Alerts#
Know before the bill arrives:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
| aws budgets create-budget \
--account-id 123456789012 \
--budget '{
"BudgetName": "Monthly Total",
"BudgetLimit": {"Amount": "5000", "Unit": "USD"},
"TimeUnit": "MONTHLY",
"BudgetType": "COST"
}' \
--notifications-with-subscribers '[{
"Notification": {
"NotificationType": "ACTUAL",
"ComparisonOperator": "GREATER_THAN",
"Threshold": 80
},
"Subscribers": [{"SubscriptionType": "EMAIL", "Address": "team@example.com"}]
}]'
|
Set alerts at 50%, 80%, 100% of budget.
Cost Anomaly Detection#
AWS can alert on unusual spending:
1
2
3
4
5
6
| aws ce create-anomaly-monitor \
--anomaly-monitor '{
"MonitorName": "ServiceMonitor",
"MonitorType": "DIMENSIONAL",
"MonitorDimension": "SERVICE"
}'
|
Catches runaway resources before they become massive bills.
Quick Wins Checklist#
Do these this week. You’ll likely save 20-40%.
The cloud is only expensive if you’re not paying attention. Pay attention.