As AI workloads become central to business operations, managing the infrastructure that powers them requires the same rigor we apply to traditional applications. Infrastructure as Code (IaC) isn’t just nice-to-have for AI—it’s essential for cost control, reproducibility, and scaling.
The AI Infrastructure Challenge#
AI workloads have unique requirements that traditional IaC patterns don’t always address:
- GPU instances that cost $3-10/hour and need careful lifecycle management
- Model artifacts that can be gigabytes in size and need versioning
- Auto-scaling that must consider both compute load and model warming time
- Spot instance strategies to reduce costs by 60-90%
Let’s build a Terraform + Ansible solution that handles these challenges.
Start with a GPU-enabled instance that can scale based on queue depth:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
| # main.tf
resource "aws_launch_template" "ai_worker" {
name_prefix = "ai-worker-"
image_id = "ami-0c02fb55956c7d316" # Deep Learning AMI
instance_type = "g4dn.xlarge" # NVIDIA T4 GPU
vpc_security_group_ids = [aws_security_group.ai_worker.id]
user_data = base64encode(templatefile("${path.module}/user_data.sh", {
s3_bucket = aws_s3_bucket.models.bucket
}))
tag_specifications {
resource_type = "instance"
tags = {
Name = "ai-worker"
Role = "inference"
}
}
}
resource "aws_autoscaling_group" "ai_workers" {
name = "ai-workers"
vpc_zone_identifier = [aws_subnet.private.id]
target_group_arns = [aws_lb_target_group.ai.arn]
min_size = 0
max_size = 10
desired_capacity = 1
launch_template {
id = aws_launch_template.ai_worker.id
version = "$Latest"
}
# Scale based on SQS queue depth
tag {
key = "AmazonECSManaged"
value = true
propagate_at_launch = false
}
}
# Custom metric for queue-based scaling
resource "aws_autoscaling_policy" "scale_up" {
name = "ai-scale-up"
scaling_adjustment = 2
adjustment_type = "ChangeInCapacity"
cooldown = 300
autoscaling_group_name = aws_autoscaling_group.ai_workers.name
}
resource "aws_cloudwatch_metric_alarm" "queue_depth_high" {
alarm_name = "ai-queue-depth-high"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "2"
metric_name = "ApproximateNumberOfMessages"
namespace = "AWS/SQS"
period = "120"
statistic = "Average"
threshold = "10"
alarm_description = "This metric monitors SQS queue depth"
alarm_actions = [aws_autoscaling_policy.scale_up.arn]
dimensions = {
QueueName = aws_sqs_queue.ai_jobs.name
}
}
|
Spot Instance Strategy#
AI training and batch inference are perfect for spot instances. Add this to your launch template:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
| resource "aws_launch_template" "ai_spot_worker" {
name_prefix = "ai-spot-worker-"
image_id = "ami-0c02fb55956c7d316"
instance_type = "g4dn.xlarge"
instance_market_options {
market_type = "spot"
spot_options {
max_price = "0.50" # 50% of on-demand price
}
}
# Handle spot interruptions gracefully
user_data = base64encode(templatefile("${path.module}/spot_handler.sh", {
s3_bucket = aws_s3_bucket.models.bucket
}))
}
|
Ansible: Configuration and Deployment#
Use Ansible to handle the complex setup that Terraform can’t:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
| # playbooks/ai-setup.yml
---
- hosts: ai_workers
become: yes
vars:
model_version: "{{ lookup('env', 'MODEL_VERSION') | default('latest') }}"
tasks:
- name: Install NVIDIA drivers and CUDA
ansible.builtin.package:
name: nvidia-docker2
state: present
- name: Download model from S3
aws_s3:
bucket: "{{ s3_bucket }}"
object: "models/{{ model_name }}/{{ model_version }}/model.tar.gz"
dest: "/opt/models/{{ model_name }}.tar.gz"
mode: get
register: model_download
- name: Extract model
ansible.builtin.unarchive:
src: "/opt/models/{{ model_name }}.tar.gz"
dest: "/opt/models/"
remote_src: yes
when: model_download.changed
- name: Start inference service
docker_container:
name: "{{ model_name }}-inference"
image: "pytorch/pytorch:latest"
volumes:
- "/opt/models:/models"
ports:
- "8080:8080"
environment:
MODEL_PATH: "/models/{{ model_name }}"
GPU_MEMORY_FRACTION: "0.8"
restart_policy: always
|
Cost Optimization Patterns#
1. Scheduled Shutdown#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
| resource "aws_autoscaling_schedule" "scale_down_evening" {
scheduled_action_name = "scale-down-evening"
min_size = 0
max_size = 2
desired_capacity = 0
recurrence = "0 22 * * *" # 10 PM daily
autoscaling_group_name = aws_autoscaling_group.ai_workers.name
}
resource "aws_autoscaling_schedule" "scale_up_morning" {
scheduled_action_name = "scale-up-morning"
min_size = 1
max_size = 10
desired_capacity = 2
recurrence = "0 8 * * 1-5" # 8 AM weekdays
autoscaling_group_name = aws_autoscaling_group.ai_workers.name
}
|
2. Model Caching Strategy#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| - name: Cache models in EFS for faster startup
mount:
path: /opt/model-cache
src: "{{ efs_dns_name }}:/"
fstype: efs
opts: tls
state: mounted
- name: Warm model cache
shell: |
if [ ! -f /opt/model-cache/{{ model_name }}/ready ]; then
cp -r /opt/models/{{ model_name }}/* /opt/model-cache/{{ model_name }}/
touch /opt/model-cache/{{ model_name }}/ready
fi
|
Monitoring and Observability#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
| resource "aws_cloudwatch_log_group" "ai_inference" {
name = "/ai/inference"
retention_in_days = 7
}
resource "aws_cloudwatch_metric_alarm" "gpu_utilization_low" {
alarm_name = "ai-gpu-utilization-low"
comparison_operator = "LessThanThreshold"
evaluation_periods = "3"
metric_name = "GPUUtilization"
namespace = "AI/Custom"
period = "300"
statistic = "Average"
threshold = "20"
alarm_description = "GPU utilization is consistently low"
# Trigger scale-down when GPUs are underutilized
alarm_actions = [aws_autoscaling_policy.scale_down.arn]
}
|
The Deployment Pipeline#
Tie it all together with a CI/CD pipeline that handles model updates:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
| #!/bin/bash
# deploy-ai-infrastructure.sh
# Deploy infrastructure
terraform init && terraform plan && terraform apply -auto-approve
# Wait for instances to be ready
sleep 300
# Deploy configuration
ansible-playbook -i aws_ec2.yml playbooks/ai-setup.yml \
-e model_version=${MODEL_VERSION} \
-e s3_bucket=${S3_BUCKET}
# Health check
curl -f http://$(terraform output load_balancer_dns)/health || exit 1
echo "AI infrastructure deployed successfully"
|
Key Takeaways#
- Treat AI infrastructure like any other workload - use IaC principles
- Leverage spot instances aggressively - 60-90% cost savings for batch workloads
- Monitor GPU utilization closely - idle GPUs are expensive mistakes
- Cache models intelligently - startup time matters for auto-scaling
- Plan for spot interruptions - graceful degradation is essential
Infrastructure as Code isn’t just about reproducibility—for AI workloads, it’s about survival. GPU costs can spiral out of control without proper automation. Start with these patterns and adapt them to your specific models and workload patterns.
The future of AI is automated infrastructure. Build it right from the start.