Manual certificate management is a reliability incident waiting to happen. A forgotten renewal, an expired cert at 3 AM, angry customers. Let’s automate this problem away.

Certbot: The Foundation

Basic Setup

1
2
3
4
5
6
7
8
9
# Install certbot
sudo apt install certbot python3-certbot-nginx

# Get certificate for nginx
sudo certbot --nginx -d example.com -d www.example.com

# Auto-renewal is configured automatically
# Test it:
sudo certbot renew --dry-run

Standalone Mode (No Web Server)

1
2
3
4
5
# Stop web server, get cert, restart
sudo certbot certonly --standalone -d example.com

# Or use DNS challenge (no downtime)
sudo certbot certonly --manual --preferred-challenges dns -d example.com

Automated Renewal with Hooks

1
2
3
4
5
6
7
8
# /etc/letsencrypt/renewal-hooks/deploy/reload-nginx.sh
#!/bin/bash
systemctl reload nginx

# /etc/letsencrypt/renewal-hooks/post/notify.sh
#!/bin/bash
curl -X POST https://slack.com/webhook \
  -d '{"text":"SSL certificate renewed for '$RENEWED_DOMAINS'"}'

cert-manager for Kubernetes

The standard for Kubernetes certificate automation.

Installation

1
2
3
4
5
# Install cert-manager
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.0/cert-manager.yaml

# Verify
kubectl get pods -n cert-manager

ClusterIssuer with Let’s Encrypt

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# cluster-issuer.yaml
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: ops@example.com
    privateKeySecretRef:
      name: letsencrypt-prod-account
    solvers:
    - http01:
        ingress:
          class: nginx

---
# Staging issuer for testing
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-staging
spec:
  acme:
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    email: ops@example.com
    privateKeySecretRef:
      name: letsencrypt-staging-account
    solvers:
    - http01:
        ingress:
          class: nginx

Certificate for Ingress

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-ingress
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - api.example.com
    secretName: api-tls  # cert-manager creates this
  rules:
  - host: api.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: api-service
            port:
              number: 80

Wildcard Certificates with DNS Challenge

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# For *.example.com
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-dns
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: ops@example.com
    privateKeySecretRef:
      name: letsencrypt-dns-account
    solvers:
    - dns01:
        route53:
          region: us-east-1
          hostedZoneID: Z1234567890
          # Use IRSA or access keys
      selector:
        dnsZones:
        - example.com

---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: wildcard-cert
  namespace: default
spec:
  secretName: wildcard-tls
  issuerRef:
    name: letsencrypt-dns
    kind: ClusterIssuer
  commonName: "*.example.com"
  dnsNames:
  - "*.example.com"
  - example.com

AWS Certificate Manager

Free certificates for AWS services (ALB, CloudFront, API Gateway).

Request Certificate

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
# acm_certificate.py
import boto3
import time

def request_certificate(domain: str, validation_method: str = 'DNS') -> str:
    """Request an ACM certificate."""
    acm = boto3.client('acm', region_name='us-east-1')
    
    response = acm.request_certificate(
        DomainName=domain,
        SubjectAlternativeNames=[
            domain,
            f'*.{domain}'
        ],
        ValidationMethod=validation_method,
        Options={
            'CertificateTransparencyLoggingPreference': 'ENABLED'
        },
        Tags=[
            {'Key': 'Environment', 'Value': 'production'},
            {'Key': 'ManagedBy', 'Value': 'automation'}
        ]
    )
    
    return response['CertificateArn']

def get_dns_validation_records(cert_arn: str) -> list:
    """Get DNS records needed for validation."""
    acm = boto3.client('acm', region_name='us-east-1')
    
    # Wait for validation options to be available
    while True:
        cert = acm.describe_certificate(CertificateArn=cert_arn)
        options = cert['Certificate'].get('DomainValidationOptions', [])
        
        if options and 'ResourceRecord' in options[0]:
            break
        time.sleep(5)
    
    records = []
    for option in options:
        record = option['ResourceRecord']
        records.append({
            'domain': option['DomainName'],
            'name': record['Name'],
            'type': record['Type'],
            'value': record['Value']
        })
    
    return records

def add_validation_records(records: list, hosted_zone_id: str):
    """Add validation records to Route53."""
    route53 = boto3.client('route53')
    
    changes = []
    for record in records:
        changes.append({
            'Action': 'UPSERT',
            'ResourceRecordSet': {
                'Name': record['name'],
                'Type': record['type'],
                'TTL': 300,
                'ResourceRecords': [{'Value': record['value']}]
            }
        })
    
    route53.change_resource_record_sets(
        HostedZoneId=hosted_zone_id,
        ChangeBatch={'Changes': changes}
    )

def wait_for_validation(cert_arn: str, timeout: int = 300):
    """Wait for certificate to be issued."""
    acm = boto3.client('acm', region_name='us-east-1')
    
    start = time.time()
    while time.time() - start < timeout:
        cert = acm.describe_certificate(CertificateArn=cert_arn)
        status = cert['Certificate']['Status']
        
        if status == 'ISSUED':
            print(f"Certificate issued: {cert_arn}")
            return True
        elif status == 'FAILED':
            raise Exception(f"Certificate failed: {cert_arn}")
        
        time.sleep(10)
    
    raise TimeoutError("Certificate validation timed out")


# Usage
cert_arn = request_certificate('example.com')
records = get_dns_validation_records(cert_arn)
add_validation_records(records, 'Z1234567890')
wait_for_validation(cert_arn)

Terraform Integration

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
# acm.tf
resource "aws_acm_certificate" "main" {
  domain_name       = "example.com"
  validation_method = "DNS"
  
  subject_alternative_names = [
    "*.example.com"
  ]
  
  lifecycle {
    create_before_destroy = true
  }
  
  tags = {
    Environment = "production"
  }
}

resource "aws_route53_record" "cert_validation" {
  for_each = {
    for dvo in aws_acm_certificate.main.domain_validation_options : dvo.domain_name => {
      name   = dvo.resource_record_name
      record = dvo.resource_record_value
      type   = dvo.resource_record_type
    }
  }
  
  zone_id = data.aws_route53_zone.main.zone_id
  name    = each.value.name
  type    = each.value.type
  ttl     = 60
  records = [each.value.record]
}

resource "aws_acm_certificate_validation" "main" {
  certificate_arn         = aws_acm_certificate.main.arn
  validation_record_fqdns = [for record in aws_route53_record.cert_validation : record.fqdn]
}

# Use with ALB
resource "aws_lb_listener" "https" {
  load_balancer_arn = aws_lb.main.arn
  port              = 443
  protocol          = "HTTPS"
  ssl_policy        = "ELBSecurityPolicy-TLS13-1-2-2021-06"
  certificate_arn   = aws_acm_certificate_validation.main.certificate_arn
  
  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.main.arn
  }
}

Certificate Monitoring

Expiration Alerts

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
# cert_monitor.py
import ssl
import socket
from datetime import datetime, timedelta
from typing import List, Dict

def check_certificate(hostname: str, port: int = 443) -> Dict:
    """Check certificate expiration for a host."""
    context = ssl.create_default_context()
    
    with socket.create_connection((hostname, port), timeout=10) as sock:
        with context.wrap_socket(sock, server_hostname=hostname) as ssock:
            cert = ssock.getpeercert()
    
    # Parse expiration
    not_after = datetime.strptime(cert['notAfter'], '%b %d %H:%M:%S %Y %Z')
    days_remaining = (not_after - datetime.utcnow()).days
    
    return {
        'hostname': hostname,
        'issuer': dict(x[0] for x in cert['issuer']),
        'subject': dict(x[0] for x in cert['subject']),
        'not_after': not_after.isoformat(),
        'days_remaining': days_remaining,
        'is_expiring_soon': days_remaining < 30,
        'is_expired': days_remaining < 0
    }

def check_all_certificates(hosts: List[str]) -> List[Dict]:
    """Check multiple hosts and return status."""
    results = []
    
    for host in hosts:
        try:
            result = check_certificate(host)
            results.append(result)
        except Exception as e:
            results.append({
                'hostname': host,
                'error': str(e)
            })
    
    return results

def alert_expiring_certificates(results: List[Dict], webhook_url: str):
    """Send alerts for expiring certificates."""
    import requests
    
    expiring = [r for r in results if r.get('is_expiring_soon')]
    
    if expiring:
        message = "⚠️ *Certificates Expiring Soon*\n\n"
        for cert in expiring:
            message += f"• {cert['hostname']}: {cert['days_remaining']} days\n"
        
        requests.post(webhook_url, json={"text": message})


# Usage
hosts = [
    "api.example.com",
    "www.example.com",
    "admin.example.com"
]

results = check_all_certificates(hosts)
for r in results:
    if r.get('days_remaining'):
        status = "✅" if r['days_remaining'] > 30 else "⚠️"
        print(f"{status} {r['hostname']}: {r['days_remaining']} days")

Prometheus Metrics

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# cert_exporter.py
from prometheus_client import Gauge, start_http_server
import time

cert_expiry_days = Gauge(
    'ssl_certificate_expiry_days',
    'Days until certificate expires',
    ['hostname']
)

cert_is_valid = Gauge(
    'ssl_certificate_valid',
    'Certificate is valid (1) or invalid (0)',
    ['hostname']
)

def update_metrics(hosts: List[str]):
    """Update Prometheus metrics for all certificates."""
    for host in hosts:
        try:
            result = check_certificate(host)
            cert_expiry_days.labels(hostname=host).set(result['days_remaining'])
            cert_is_valid.labels(hostname=host).set(1 if result['days_remaining'] > 0 else 0)
        except Exception:
            cert_is_valid.labels(hostname=host).set(0)

# Run exporter
start_http_server(9117)
while True:
    update_metrics(hosts)
    time.sleep(3600)  # Check hourly

Best Practices

  1. Use automation from day one — Manual certs are technical debt
  2. Monitor expiration — Alert at 30 days, panic at 7
  3. Use staging first — Let’s Encrypt rate limits are real
  4. Automate DNS validation — HTTP challenges require coordination
  5. Store certs properly — Kubernetes secrets, not git repos
  6. Rotate before expiration — Don’t cut it close
  7. Use strong TLS settings — TLS 1.2+ only, strong ciphers

Quick Reference

Use CaseSolution
Single serverCertbot + nginx/apache plugin
Kubernetescert-manager + ClusterIssuer
AWS ALB/CloudFrontACM (free, auto-renews)
Wildcard certsDNS-01 challenge required
Internal/private CAcert-manager + self-signed issuer

SSL automation isn’t optional anymore—it’s table stakes. Set it up once, monitor it always, and never think about certificate renewals again.


This is post #40 in a 24-hour deep dive into DevOps and infrastructure. Thanks for reading. 🌍