Devops

Docker Compose for Production: Patterns That Actually Work

Docker Compose isn’t just for development. With the right patterns, it’s a legitimate production deployment tool for small-to-medium workloads. Here’s how to do it without the footguns. Base Structure 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 # docker-compose.yml version: "3.8" services: app: image: myapp:${VERSION:-latest} restart: unless-stopped environment: - NODE_ENV=production deploy: resources: limits: cpus: '2' memory: 1G healthcheck: test: ["CMD", "curl", "-f", "http://localhost:3000/health"] interval: 30s timeout: 10s retries: 3 start_period: 40s Key elements: ...

Secrets Management: Stop Committing Your API Keys

We’ve all done it. Committed a database password. Pushed an API key. Then frantically force-pushed hoping nobody noticed. Here’s how to manage secrets properly so that never happens again. The Problem 1 2 3 4 5 6 7 8 9 # Bad: Secrets in code DATABASE_URL="postgres://admin:supersecret@db.example.com/prod" # Bad: Secrets in .env checked into git # .env API_KEY=sk-live-abc123 # Bad: Secrets in CI/CD logs echo "Deploying with $DATABASE_PASSWORD" Secrets in code get leaked. Always. It’s just a matter of when. ...

Ansible Patterns That Scale: From Ad-Hoc to Production

Ansible is deceptively simple. Write some YAML, run it, things happen. Then your playbooks grow, your team grows, and suddenly everything is a mess. Here’s how to write Ansible that scales. Project Structure a ├ ├ │ │ │ │ │ │ │ │ │ ├ │ │ │ ├ │ │ │ └ n ─ ─ ─ ─ ─ s ─ ─ ─ ─ ─ i b a i ├ │ │ │ │ │ └ p ├ ├ └ r ├ ├ └ c └ l n n ─ ─ l ─ ─ ─ o ─ ─ ─ o ─ e s v ─ ─ a ─ ─ ─ l ─ ─ ─ l ─ / i e y e l b n p ├ └ s ├ └ b s w d s c n p e r l t r ─ ─ t ─ ─ o i e a / o g o c e e o o ─ ─ a ─ ─ o t b t m i s t q . r d g k e s a m n t i u c i u h g ├ ├ └ i h g s . e b o x g o i f e c o r ─ ─ ─ n o r y r a n r n r g s t s o ─ ─ ─ g s m v s e s e / i t u / t u l e e s / m o s p a w d s p r s q e n . _ l e a . _ s . l n y v l b t y v . y / t m a . s a m a y m s l r y e b l r m l . s m r a s l y / l v s / m e e l r s s . . y y m m l l Separate inventories per environment. Group vars by function. Roles for reusable logic. ...

Structured Logging: Logs That Actually Help You Debug

Your logs are probably useless. Not because you’re not logging, but because you’re logging wrong. Here’s how to make logs that actually help when things break. The Problem with Unstructured Logs [ [ [ 2 2 2 0 0 0 2 2 2 4 4 4 - - - 0 0 0 3 3 3 - - - 1 1 1 2 2 2 1 1 1 0 0 0 : : : 2 2 2 3 3 3 : : : 4 4 4 5 6 7 ] ] ] I E P N R r F R o O O c : R e : s U s s S i e o n r m g e l t o o h r g i d g n e e g r d w f i e o n n r t u w s r e o r n g 1 2 3 4 5 Try answering these questions: ...

Backup Strategies That Actually Work When You Need Them

Everyone has backups. Few people have tested restores. Here’s how to build backup strategies that work when disaster strikes. The 3-2-1 Rule The foundation of backup strategy: 3 copies of your data 2 different storage types 1 copy offsite Example: Production database (primary) Local backup server (different disk) S3 in another region (offsite) This survives disk failure, server failure, and site failure. Database Backups PostgreSQL Logical backups (pg_dump): 1 2 3 4 5 6 7 8 # Full database pg_dump -Fc mydb > backup.dump # Specific tables pg_dump -Fc -t users -t orders mydb > partial.dump # Schema only pg_dump -Fc --schema-only mydb > schema.dump Physical backups (pg_basebackup): ...

Infrastructure Testing: Confidence Before Production

You test your application code. Why not your infrastructure? Here’s how to build confidence that your Terraform, Ansible, and Kubernetes configs actually work. Why Test Infrastructure? Infrastructure code has the same problems as application code: Typos break things Logic errors cause outages Refactoring introduces regressions “It works on my machine” applies to terraform too The difference: infrastructure mistakes often cost more. A bad deployment can take down production, corrupt data, or rack up cloud bills. ...

Incident Response: A Playbook for When Things Go Wrong

Something is broken in production. Customers are complaining. Your heart rate is elevated. What now? Having a playbook before the incident happens is the difference between a coordinated response and chaos. Here’s the playbook. The First 5 Minutes 1. Acknowledge the Incident Someone needs to own it. Right now. @ S I C c t n o h a c m a t i m n u d s n s e : e : n l t # I i 🚨 n C n v o c I e m i N s m d C t a e I i n n D g d t E a e - N t r 2 T i : 0 : n 2 g @ 4 P a - a l 0 y i 3 m c - e e 1 n 2 t p r o c e s s i n g f a i l i n g Create a dedicated channel immediately. All incident communication goes there. ...

Container Security Essentials: Beyond docker run

Containers aren’t inherently secure. They share a kernel with the host. A container escape is a host compromise. Here’s how to not be the cautionary tale. Image Security Use Minimal Base Images Every package is attack surface. Minimize it. 1 2 3 4 5 6 7 8 # Bad: Full OS with thousands of packages FROM ubuntu:22.04 # Better: Minimal OS FROM alpine:3.19 # Best: Distroless (no shell, no package manager) FROM gcr.io/distroless/static-debian12 Distroless images contain only your app and runtime dependencies. No shell means attackers can’t get a shell. ...

Database Migrations Without Fear: Patterns That Won't Wake You Up at 3am

Database migrations are where deployments go to die. One bad migration can corrupt data, lock tables for hours, or bring down production entirely. Here’s how to make them boring. The Golden Rules Every migration must be reversible (or explicitly marked as not) Never run migrations during deploy (separate the concerns) Always test against production-scale data (10 rows works, 10 million doesn’t) Assume the migration will fail halfway (design for it) The Expand-Contract Pattern The safest way to change schemas: expand first, contract later. ...

Cloud Cost Optimization: Stop Burning Money on AWS

Your AWS bill is too high. Everyone’s is. The cloud makes it trivially easy to spin up resources and surprisingly hard to know what you’re actually paying for. Here’s how to stop the bleeding. The Low-Hanging Fruit Unused Resources The easiest savings come from things you’re not using. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # Find unattached EBS volumes aws ec2 describe-volumes \ --filters Name=status,Values=available \ --query 'Volumes[*].[VolumeId,Size,CreateTime]' \ --output table # Find unused Elastic IPs aws ec2 describe-addresses \ --query 'Addresses[?AssociationId==`null`].[PublicIp,AllocationId]' \ --output table # Find idle load balancers (no healthy targets) aws elbv2 describe-target-health \ --query 'TargetHealthDescriptions[?TargetHealth.State!=`healthy`]' Run these monthly. You’ll find forgotten resources every time. ...