Containers aren’t inherently secure. They share a kernel with the host. A container escape is a host compromise. Here’s how to not be the cautionary tale.
Image Security#
Use Minimal Base Images#
Every package is attack surface. Minimize it.
1
2
3
4
5
6
7
8
| # Bad: Full OS with thousands of packages
FROM ubuntu:22.04
# Better: Minimal OS
FROM alpine:3.19
# Best: Distroless (no shell, no package manager)
FROM gcr.io/distroless/static-debian12
|
Distroless images contain only your app and runtime dependencies. No shell means attackers can’t get a shell.
Pin Your Versions#
1
2
3
4
5
6
7
8
| # Bad: Tag can change
FROM node:18
# Better: Specific version
FROM node:18.19.0
# Best: SHA256 digest (immutable)
FROM node@sha256:abc123...
|
Get the digest:
1
2
| docker pull node:18.19.0
docker inspect --format='{{index .RepoDigests 0}}' node:18.19.0
|
Scan Images#
Build scanning into your CI pipeline:
1
2
3
4
5
6
7
8
| # GitHub Actions example
- name: Scan image
uses: aquasecurity/trivy-action@master
with:
image-ref: 'myapp:${{ github.sha }}'
format: 'sarif'
exit-code: '1'
severity: 'CRITICAL,HIGH'
|
Block deployments on critical vulnerabilities. No exceptions.
Multi-Stage Builds#
Don’t ship build tools in production images:
1
2
3
4
5
6
7
8
9
10
11
| # Build stage
FROM golang:1.22 AS builder
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 go build -o myapp
# Production stage
FROM gcr.io/distroless/static-debian12
COPY --from=builder /app/myapp /
USER nonroot:nonroot
ENTRYPOINT ["/myapp"]
|
Build tools, source code, and intermediate files stay in the build stage.
Runtime Security#
Don’t Run as Root#
1
2
3
4
5
6
7
8
| # Create non-root user
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
# Switch to non-root
USER appuser
# Or for distroless
USER nonroot:nonroot
|
Verify at runtime:
1
2
| docker run myapp whoami
# Should NOT be root
|
Drop Capabilities#
Containers get Linux capabilities by default. Drop them:
1
2
3
4
5
6
7
8
| # docker-compose.yml
services:
app:
image: myapp
cap_drop:
- ALL
cap_add:
- NET_BIND_SERVICE # Only if needed
|
1
2
| # Docker run
docker run --cap-drop=ALL --cap-add=NET_BIND_SERVICE myapp
|
Read-Only Filesystem#
Prevent runtime modifications:
1
2
3
4
5
6
7
| services:
app:
image: myapp
read_only: true
tmpfs:
- /tmp
- /var/run
|
Your app needs to write somewhere? Mount specific tmpfs or volumes.
No Privilege Escalation#
1
2
3
4
| services:
app:
security_opt:
- no-new-privileges:true
|
Prevents processes from gaining additional privileges via setuid binaries.
Resource Limits#
Prevent resource exhaustion attacks:
1
2
3
4
5
6
7
8
9
10
11
| services:
app:
deploy:
resources:
limits:
cpus: '1.0'
memory: 512M
reservations:
cpus: '0.25'
memory: 128M
pids_limit: 100
|
pids_limit prevents fork bombs.
Network Security#
Don’t Expose Unnecessary Ports#
1
2
3
4
5
| # Bad: Exposes to all interfaces
EXPOSE 8080
# This is just documentation. The real problem:
docker run -p 8080:8080 myapp # Binds to 0.0.0.0
|
Bind to localhost when possible:
1
| docker run -p 127.0.0.1:8080:8080 myapp
|
Network Isolation#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
| # docker-compose.yml
services:
app:
networks:
- frontend
db:
networks:
- backend
api:
networks:
- frontend
- backend
networks:
frontend:
backend:
internal: true # No external access
|
Database on an internal network can’t be reached from outside.
Use Network Policies (Kubernetes)#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
| apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: api-network-policy
spec:
podSelector:
matchLabels:
app: api
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
app: frontend
ports:
- port: 8080
egress:
- to:
- podSelector:
matchLabels:
app: database
ports:
- port: 5432
|
Default deny, explicit allow. Like a firewall for pods.
Secrets Management#
Never Bake Secrets Into Images#
1
2
3
| # NEVER DO THIS
ENV API_KEY=sk-1234567890
COPY .env /app/.env
|
Secrets in images are extractable:
1
| docker history --no-trunc myapp
|
Use Runtime Secrets#
Docker Swarm:
1
2
3
4
5
6
7
8
| services:
app:
secrets:
- api_key
secrets:
api_key:
external: true
|
Kubernetes:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
| apiVersion: v1
kind: Secret
metadata:
name: app-secrets
type: Opaque
data:
api-key: c2stMTIzNDU2Nzg5MA== # base64 encoded
---
apiVersion: v1
kind: Pod
spec:
containers:
- name: app
envFrom:
- secretRef:
name: app-secrets
|
Better: Use a secrets manager (Vault, AWS Secrets Manager) with injection at runtime.
Registry Security#
Use Private Registries#
Don’t pull random images from Docker Hub in production.
1
2
| # Pull from private registry
docker pull registry.company.com/myapp:1.0.0
|
Sign Images#
Cosign (recommended):
1
2
3
4
5
| # Sign
cosign sign --key cosign.key registry.company.com/myapp:1.0.0
# Verify
cosign verify --key cosign.pub registry.company.com/myapp:1.0.0
|
Kubernetes can enforce signature verification:
1
2
3
| apiVersion: policy/v1
kind: ImagePolicyWebhook
# ... configuration to verify signatures
|
Enable Content Trust#
1
2
| export DOCKER_CONTENT_TRUST=1
docker pull myimage:latest # Only pulls signed images
|
Security Contexts (Kubernetes)#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
| apiVersion: v1
kind: Pod
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
runAsGroup: 1000
fsGroup: 1000
seccompProfile:
type: RuntimeDefault
containers:
- name: app
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
|
Use Pod Security Standards to enforce these cluster-wide:
1
2
3
4
5
6
| apiVersion: v1
kind: Namespace
metadata:
name: production
labels:
pod-security.kubernetes.io/enforce: restricted
|
Audit and Monitoring#
Log Container Events#
1
2
3
4
5
| # Docker events
docker events --filter 'type=container'
# What's running
docker ps --format 'table {{.Names}}\t{{.Image}}\t{{.Status}}'
|
Falco — Detect anomalous behavior:
1
2
3
4
5
6
7
| - rule: Terminal shell in container
desc: Detect shell spawned in container
condition: >
spawned_process and container and shell_procs
output: >
Shell spawned in container (user=%user.name container=%container.name)
priority: WARNING
|
Sysdig — Deep visibility into container syscalls.
Regular Audits#
1
2
3
4
5
| # Check for containers running as root
docker ps -q | xargs -I {} docker inspect --format '{{.Name}}: User={{.Config.User}}' {}
# Check for privileged containers
docker ps -q | xargs -I {} docker inspect --format '{{.Name}}: Privileged={{.HostConfig.Privileged}}' {}
|
Quick Security Checklist#
Image:
Runtime:
Network:
Secrets:
Start Here#
- Today: Add
USER nonroot to your Dockerfiles - This week: Enable image scanning in CI
- This month: Implement network policies
- This quarter: Deploy runtime security monitoring
Container security isn’t optional. It’s the difference between “containers in production” and “secure containers in production.”
The most secure container is one with nothing in it except exactly what needs to run. Everything else is attack surface.