A good CI/CD pipeline catches bugs early, deploys reliably, and gets out of your way. A bad one is slow, flaky, and becomes the team’s bottleneck.
Let’s build a good one.
Pipeline Stages# A typical pipeline flows through these stages:
C o m m i t → B u i l d → T e s t → S e c u r i t y S c a n → D e p l o y S t a g i n g → D e p l o y P r o d
Each stage gates the next. Fail early, fail fast.
Stage 1: Build# Compile code, install dependencies, create artifacts:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# GitHub Actions
build :
runs-on : ubuntu-latest
steps :
- uses : actions/checkout@v4
- name : Setup Node
uses : actions/setup-node@v4
with :
node-version : '20'
cache : 'npm'
- name : Install dependencies
run : npm ci
- name : Build
run : npm run build
- name : Upload artifact
uses : actions/upload-artifact@v4
with :
name : build
path : dist/
Key practices:
Use lockfiles (package-lock.json, poetry.lock) Cache dependencies between runs Build once, deploy the same artifact everywhere Stage 2: Test# Run tests at multiple levels:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
test :
needs : build
runs-on : ubuntu-latest
strategy :
matrix :
test-type : [ unit, integration]
steps :
- uses : actions/checkout@v4
- name : Download build
uses : actions/download-artifact@v4
with :
name : build
- name : Run unit tests
if : matrix.test-type == 'unit'
run : npm run test:unit -- --coverage
- name : Run integration tests
if : matrix.test-type == 'integration'
run : |
docker-compose up -d
npm run test:integration
docker-compose down
- name : Upload coverage
uses : codecov/codecov-action@v4
Testing pyramid:
Many unit tests (fast, isolated) Fewer integration tests (slower, more realistic) Few E2E tests (slowest, highest confidence) Stage 3: Security Scanning# Catch vulnerabilities before they reach production:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
security :
needs : build
runs-on : ubuntu-latest
steps :
- uses : actions/checkout@v4
# Dependency vulnerabilities
- name : Dependency scan
run : npm audit --audit-level=high
# Static analysis
- name : SAST scan
uses : github/codeql-action/analyze@v3
# Container scanning
- name : Container scan
uses : aquasecurity/trivy-action@master
with :
image-ref : '${{ env.IMAGE }}'
exit-code : '1'
severity : 'CRITICAL,HIGH'
# Secrets detection
- name : Secret scan
uses : trufflesecurity/trufflehog@main
with :
path : ./
Stage 4: Deploy to Staging# Deploy to a production-like environment:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
deploy-staging :
needs : [ test, security]
runs-on : ubuntu-latest
environment : staging
steps :
- name : Download artifact
uses : actions/download-artifact@v4
with :
name : build
- name : Deploy to staging
run : |
aws s3 sync dist/ s3://staging-bucket/
aws cloudfront create-invalidation \
--distribution-id $STAGING_CF_ID \
--paths "/*"
- name : Run smoke tests
run : |
curl -sf https://staging.example.com/health
npm run test:smoke -- --env=staging
Stage 5: Deploy to Production# With manual approval for critical systems:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
deploy-prod :
needs : deploy-staging
runs-on : ubuntu-latest
environment :
name : production
url : https://example.com
steps :
- name : Download artifact
uses : actions/download-artifact@v4
with :
name : build
- name : Deploy to production
run : |
aws s3 sync dist/ s3://prod-bucket/
aws cloudfront create-invalidation \
--distribution-id $PROD_CF_ID \
--paths "/*"
- name : Verify deployment
run : |
curl -sf https://example.com/health
npm run test:smoke -- --env=production
- name : Notify team
run : |
curl -X POST $SLACK_WEBHOOK \
-d '{"text":"Deployed ${{ github.sha }} to production"}'
Deployment Strategies# Direct Deploy (All at Once)# Simple but risky:
1
2
- name : Deploy
run : kubectl set image deployment/app app=$IMAGE
Rolling Deploy# Gradual replacement:
1
2
3
4
5
6
7
8
# Kubernetes rolling update
spec :
replicas : 4
strategy :
type : RollingUpdate
rollingUpdate :
maxSurge : 1
maxUnavailable : 1
Blue-Green Deploy# Instant switch with instant rollback:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
- name : Deploy to green
run : |
kubectl apply -f k8s/green/
kubectl wait --for=condition=ready pod -l env=green
- name : Switch traffic
run : |
kubectl patch service app -p '{"spec":{"selector":{"env":"green"}}}'
- name : Verify
run : ./scripts/smoke-test.sh
- name : Cleanup blue
if : success()
run : kubectl delete -f k8s/blue/
Canary Deploy# Gradual traffic shift:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
- name : Deploy canary (10%)
run : |
kubectl apply -f k8s/canary/
kubectl patch service app --type=merge -p '
{"spec":{"ports":[{"port":80,"targetPort":8080}]}}
'
# Configure ingress for 10% canary
- name : Monitor canary
run : |
sleep 300 # 5 minutes
./scripts/check-error-rate.sh canary
- name : Promote or rollback
run : |
if ./scripts/canary-healthy.sh; then
kubectl scale deployment/app-canary --replicas=4
kubectl scale deployment/app-stable --replicas=0
else
kubectl delete -f k8s/canary/
exit 1
fi
Pipeline Optimization# Parallelization# Run independent jobs concurrently:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
jobs :
lint :
runs-on : ubuntu-latest
steps : [ ...]
test-unit :
runs-on : ubuntu-latest
steps : [ ...]
test-integration :
runs-on : ubuntu-latest
steps : [ ...]
security :
runs-on : ubuntu-latest
steps : [ ...]
# Only runs after all above pass
deploy :
needs : [ lint, test-unit, test-integration, security]
steps : [ ...]
Caching# Cache dependencies between runs:
1
2
3
4
5
6
7
8
- uses : actions/cache@v4
with :
path : |
~/.npm
node_modules
key : npm-${{ hashFiles('package-lock.json') }}
restore-keys : |
npm-
Skip Unnecessary Work# 1
2
3
4
5
6
7
8
9
on :
push :
paths :
- 'src/**'
- 'package.json'
- '.github/workflows/**'
paths-ignore :
- '**.md'
- 'docs/**'
Fail Fast# Stop other jobs when one fails:
1
2
3
4
strategy :
fail-fast : true
matrix :
node : [ 18 , 20 , 22 ]
Environment Protection# Required Reviewers# 1
2
3
environment :
name : production
# Requires manual approval from designated reviewers
Configure in GitHub: Settings → Environments → production → Required reviewers
Branch Protection# Require PR reviews before merge Require status checks to pass Require linear history Restrict who can push Secrets Management# 1
2
3
env :
AWS_ACCESS_KEY_ID : ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY : ${{ secrets.AWS_SECRET_ACCESS_KEY }}
Use OIDC for cloud providers when possible:
1
2
3
4
5
- name : Configure AWS credentials
uses : aws-actions/configure-aws-credentials@v4
with :
role-to-assume : arn:aws:iam::123456789:role/github-actions
aws-region : us-east-1
Handling Failures# Automatic Rollback# 1
2
3
4
5
6
7
- name : Deploy
id : deploy
run : ./deploy.sh
- name : Rollback on failure
if : failure() && steps.deploy.outcome == 'failure'
run : ./rollback.sh
Notifications# 1
2
3
4
5
6
7
8
9
10
- name : Notify on failure
if : failure()
uses : slackapi/slack-github-action@v1
with :
channel-id : 'deployments'
slack-message : |
❌ Deployment failed
Commit: ${{ github.sha }}
Author: ${{ github.actor }}
<${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|View logs>
Retry Logic# 1
2
3
4
5
6
- name : Deploy with retry
uses : nick-fields/retry@v2
with :
timeout_minutes : 10
max_attempts : 3
command : ./deploy.sh
Metrics to Track# Monitor your pipeline health:
Lead time : Commit to productionDeployment frequency : How often you deployChange failure rate : Percentage of deployments causing issuesMean time to recovery : How fast you fix failures1
2
3
4
5
6
7
8
9
- name : Record deployment metrics
run : |
curl -X POST $METRICS_ENDPOINT \
-d '{
"event": "deployment",
"sha": "${{ github.sha }}",
"environment": "production",
"duration_seconds": ${{ steps.deploy.outputs.duration }}
}'
Complete Example# 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
name : CI/CD Pipeline
on :
push :
branches : [ main]
pull_request :
branches : [ main]
jobs :
build :
runs-on : ubuntu-latest
outputs :
image : ${{ steps.build.outputs.image }}
steps :
- uses : actions/checkout@v4
- uses : docker/build-push-action@v5
id : build
with :
push : ${{ github.event_name == 'push' }}
tags : app:${{ github.sha }}
test :
needs : build
runs-on : ubuntu-latest
steps :
- uses : actions/checkout@v4
- run : npm ci && npm test
security :
needs : build
runs-on : ubuntu-latest
steps :
- uses : aquasecurity/trivy-action@master
with :
image-ref : app:${{ github.sha }}
deploy-staging :
if : github.event_name == 'push'
needs : [ test, security]
runs-on : ubuntu-latest
environment : staging
steps :
- run : kubectl set image deployment/app app=app:${{ github.sha }}
deploy-prod :
needs : deploy-staging
runs-on : ubuntu-latest
environment : production
steps :
- run : kubectl set image deployment/app app=app:${{ github.sha }}
A pipeline should be fast enough that developers don’t go get coffee, and reliable enough that they trust it. Build both.