Manual certificate management is a reliability incident waiting to happen. A forgotten renewal, an expired cert at 3 AM, angry customers. Let’s automate this problem away.
Certbot: The Foundation#
Basic Setup#
1
2
3
4
5
6
7
8
9
| # Install certbot
sudo apt install certbot python3-certbot-nginx
# Get certificate for nginx
sudo certbot --nginx -d example.com -d www.example.com
# Auto-renewal is configured automatically
# Test it:
sudo certbot renew --dry-run
|
Standalone Mode (No Web Server)#
1
2
3
4
5
| # Stop web server, get cert, restart
sudo certbot certonly --standalone -d example.com
# Or use DNS challenge (no downtime)
sudo certbot certonly --manual --preferred-challenges dns -d example.com
|
Automated Renewal with Hooks#
1
2
3
4
5
6
7
8
| # /etc/letsencrypt/renewal-hooks/deploy/reload-nginx.sh
#!/bin/bash
systemctl reload nginx
# /etc/letsencrypt/renewal-hooks/post/notify.sh
#!/bin/bash
curl -X POST https://slack.com/webhook \
-d '{"text":"SSL certificate renewed for '$RENEWED_DOMAINS'"}'
|
cert-manager for Kubernetes#
The standard for Kubernetes certificate automation.
Installation#
1
2
3
4
5
| # Install cert-manager
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.0/cert-manager.yaml
# Verify
kubectl get pods -n cert-manager
|
ClusterIssuer with Let’s Encrypt#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
| # cluster-issuer.yaml
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: ops@example.com
privateKeySecretRef:
name: letsencrypt-prod-account
solvers:
- http01:
ingress:
class: nginx
---
# Staging issuer for testing
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-staging
spec:
acme:
server: https://acme-staging-v02.api.letsencrypt.org/directory
email: ops@example.com
privateKeySecretRef:
name: letsencrypt-staging-account
solvers:
- http01:
ingress:
class: nginx
|
Certificate for Ingress#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
| # ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: api-ingress
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
ingressClassName: nginx
tls:
- hosts:
- api.example.com
secretName: api-tls # cert-manager creates this
rules:
- host: api.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: api-service
port:
number: 80
|
Wildcard Certificates with DNS Challenge#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
| # For *.example.com
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-dns
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: ops@example.com
privateKeySecretRef:
name: letsencrypt-dns-account
solvers:
- dns01:
route53:
region: us-east-1
hostedZoneID: Z1234567890
# Use IRSA or access keys
selector:
dnsZones:
- example.com
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: wildcard-cert
namespace: default
spec:
secretName: wildcard-tls
issuerRef:
name: letsencrypt-dns
kind: ClusterIssuer
commonName: "*.example.com"
dnsNames:
- "*.example.com"
- example.com
|
AWS Certificate Manager#
Free certificates for AWS services (ALB, CloudFront, API Gateway).
Request Certificate#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
| # acm_certificate.py
import boto3
import time
def request_certificate(domain: str, validation_method: str = 'DNS') -> str:
"""Request an ACM certificate."""
acm = boto3.client('acm', region_name='us-east-1')
response = acm.request_certificate(
DomainName=domain,
SubjectAlternativeNames=[
domain,
f'*.{domain}'
],
ValidationMethod=validation_method,
Options={
'CertificateTransparencyLoggingPreference': 'ENABLED'
},
Tags=[
{'Key': 'Environment', 'Value': 'production'},
{'Key': 'ManagedBy', 'Value': 'automation'}
]
)
return response['CertificateArn']
def get_dns_validation_records(cert_arn: str) -> list:
"""Get DNS records needed for validation."""
acm = boto3.client('acm', region_name='us-east-1')
# Wait for validation options to be available
while True:
cert = acm.describe_certificate(CertificateArn=cert_arn)
options = cert['Certificate'].get('DomainValidationOptions', [])
if options and 'ResourceRecord' in options[0]:
break
time.sleep(5)
records = []
for option in options:
record = option['ResourceRecord']
records.append({
'domain': option['DomainName'],
'name': record['Name'],
'type': record['Type'],
'value': record['Value']
})
return records
def add_validation_records(records: list, hosted_zone_id: str):
"""Add validation records to Route53."""
route53 = boto3.client('route53')
changes = []
for record in records:
changes.append({
'Action': 'UPSERT',
'ResourceRecordSet': {
'Name': record['name'],
'Type': record['type'],
'TTL': 300,
'ResourceRecords': [{'Value': record['value']}]
}
})
route53.change_resource_record_sets(
HostedZoneId=hosted_zone_id,
ChangeBatch={'Changes': changes}
)
def wait_for_validation(cert_arn: str, timeout: int = 300):
"""Wait for certificate to be issued."""
acm = boto3.client('acm', region_name='us-east-1')
start = time.time()
while time.time() - start < timeout:
cert = acm.describe_certificate(CertificateArn=cert_arn)
status = cert['Certificate']['Status']
if status == 'ISSUED':
print(f"Certificate issued: {cert_arn}")
return True
elif status == 'FAILED':
raise Exception(f"Certificate failed: {cert_arn}")
time.sleep(10)
raise TimeoutError("Certificate validation timed out")
# Usage
cert_arn = request_certificate('example.com')
records = get_dns_validation_records(cert_arn)
add_validation_records(records, 'Z1234567890')
wait_for_validation(cert_arn)
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
| # acm.tf
resource "aws_acm_certificate" "main" {
domain_name = "example.com"
validation_method = "DNS"
subject_alternative_names = [
"*.example.com"
]
lifecycle {
create_before_destroy = true
}
tags = {
Environment = "production"
}
}
resource "aws_route53_record" "cert_validation" {
for_each = {
for dvo in aws_acm_certificate.main.domain_validation_options : dvo.domain_name => {
name = dvo.resource_record_name
record = dvo.resource_record_value
type = dvo.resource_record_type
}
}
zone_id = data.aws_route53_zone.main.zone_id
name = each.value.name
type = each.value.type
ttl = 60
records = [each.value.record]
}
resource "aws_acm_certificate_validation" "main" {
certificate_arn = aws_acm_certificate.main.arn
validation_record_fqdns = [for record in aws_route53_record.cert_validation : record.fqdn]
}
# Use with ALB
resource "aws_lb_listener" "https" {
load_balancer_arn = aws_lb.main.arn
port = 443
protocol = "HTTPS"
ssl_policy = "ELBSecurityPolicy-TLS13-1-2-2021-06"
certificate_arn = aws_acm_certificate_validation.main.certificate_arn
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.main.arn
}
}
|
Certificate Monitoring#
Expiration Alerts#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
| # cert_monitor.py
import ssl
import socket
from datetime import datetime, timedelta
from typing import List, Dict
def check_certificate(hostname: str, port: int = 443) -> Dict:
"""Check certificate expiration for a host."""
context = ssl.create_default_context()
with socket.create_connection((hostname, port), timeout=10) as sock:
with context.wrap_socket(sock, server_hostname=hostname) as ssock:
cert = ssock.getpeercert()
# Parse expiration
not_after = datetime.strptime(cert['notAfter'], '%b %d %H:%M:%S %Y %Z')
days_remaining = (not_after - datetime.utcnow()).days
return {
'hostname': hostname,
'issuer': dict(x[0] for x in cert['issuer']),
'subject': dict(x[0] for x in cert['subject']),
'not_after': not_after.isoformat(),
'days_remaining': days_remaining,
'is_expiring_soon': days_remaining < 30,
'is_expired': days_remaining < 0
}
def check_all_certificates(hosts: List[str]) -> List[Dict]:
"""Check multiple hosts and return status."""
results = []
for host in hosts:
try:
result = check_certificate(host)
results.append(result)
except Exception as e:
results.append({
'hostname': host,
'error': str(e)
})
return results
def alert_expiring_certificates(results: List[Dict], webhook_url: str):
"""Send alerts for expiring certificates."""
import requests
expiring = [r for r in results if r.get('is_expiring_soon')]
if expiring:
message = "⚠️ *Certificates Expiring Soon*\n\n"
for cert in expiring:
message += f"• {cert['hostname']}: {cert['days_remaining']} days\n"
requests.post(webhook_url, json={"text": message})
# Usage
hosts = [
"api.example.com",
"www.example.com",
"admin.example.com"
]
results = check_all_certificates(hosts)
for r in results:
if r.get('days_remaining'):
status = "✅" if r['days_remaining'] > 30 else "⚠️"
print(f"{status} {r['hostname']}: {r['days_remaining']} days")
|
Prometheus Metrics#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
| # cert_exporter.py
from prometheus_client import Gauge, start_http_server
import time
cert_expiry_days = Gauge(
'ssl_certificate_expiry_days',
'Days until certificate expires',
['hostname']
)
cert_is_valid = Gauge(
'ssl_certificate_valid',
'Certificate is valid (1) or invalid (0)',
['hostname']
)
def update_metrics(hosts: List[str]):
"""Update Prometheus metrics for all certificates."""
for host in hosts:
try:
result = check_certificate(host)
cert_expiry_days.labels(hostname=host).set(result['days_remaining'])
cert_is_valid.labels(hostname=host).set(1 if result['days_remaining'] > 0 else 0)
except Exception:
cert_is_valid.labels(hostname=host).set(0)
# Run exporter
start_http_server(9117)
while True:
update_metrics(hosts)
time.sleep(3600) # Check hourly
|
Best Practices#
- Use automation from day one — Manual certs are technical debt
- Monitor expiration — Alert at 30 days, panic at 7
- Use staging first — Let’s Encrypt rate limits are real
- Automate DNS validation — HTTP challenges require coordination
- Store certs properly — Kubernetes secrets, not git repos
- Rotate before expiration — Don’t cut it close
- Use strong TLS settings — TLS 1.2+ only, strong ciphers
Quick Reference#
| Use Case | Solution |
|---|
| Single server | Certbot + nginx/apache plugin |
| Kubernetes | cert-manager + ClusterIssuer |
| AWS ALB/CloudFront | ACM (free, auto-renews) |
| Wildcard certs | DNS-01 challenge required |
| Internal/private CA | cert-manager + self-signed issuer |
SSL automation isn’t optional anymore—it’s table stakes. Set it up once, monitor it always, and never think about certificate renewals again.
This is post #40 in a 24-hour deep dive into DevOps and infrastructure. Thanks for reading. 🌍
📬 Get the Newsletter
Weekly insights on DevOps, automation, and CLI mastery. No spam, unsubscribe anytime.