A good CI/CD pipeline catches bugs early, deploys reliably, and gets out of your way. A bad one is slow, flaky, and becomes the team’s bottleneck.

Let’s build a good one.

Pipeline Stages

A typical pipeline flows through these stages:

CommitBuildTestSecurityScanDeployStagingDeployProd

Each stage gates the next. Fail early, fail fast.

Stage 1: Build

Compile code, install dependencies, create artifacts:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# GitHub Actions
build:
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
    
    - name: Setup Node
      uses: actions/setup-node@v4
      with:
        node-version: '20'
        cache: 'npm'
    
    - name: Install dependencies
      run: npm ci
    
    - name: Build
      run: npm run build
    
    - name: Upload artifact
      uses: actions/upload-artifact@v4
      with:
        name: build
        path: dist/

Key practices:

  • Use lockfiles (package-lock.json, poetry.lock)
  • Cache dependencies between runs
  • Build once, deploy the same artifact everywhere

Stage 2: Test

Run tests at multiple levels:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
test:
  needs: build
  runs-on: ubuntu-latest
  strategy:
    matrix:
      test-type: [unit, integration]
  steps:
    - uses: actions/checkout@v4
    
    - name: Download build
      uses: actions/download-artifact@v4
      with:
        name: build
    
    - name: Run unit tests
      if: matrix.test-type == 'unit'
      run: npm run test:unit -- --coverage
    
    - name: Run integration tests
      if: matrix.test-type == 'integration'
      run: |
        docker-compose up -d
        npm run test:integration
        docker-compose down
    
    - name: Upload coverage
      uses: codecov/codecov-action@v4

Testing pyramid:

  • Many unit tests (fast, isolated)
  • Fewer integration tests (slower, more realistic)
  • Few E2E tests (slowest, highest confidence)

Stage 3: Security Scanning

Catch vulnerabilities before they reach production:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
security:
  needs: build
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
    
    # Dependency vulnerabilities
    - name: Dependency scan
      run: npm audit --audit-level=high
    
    # Static analysis
    - name: SAST scan
      uses: github/codeql-action/analyze@v3
    
    # Container scanning
    - name: Container scan
      uses: aquasecurity/trivy-action@master
      with:
        image-ref: '${{ env.IMAGE }}'
        exit-code: '1'
        severity: 'CRITICAL,HIGH'
    
    # Secrets detection
    - name: Secret scan
      uses: trufflesecurity/trufflehog@main
      with:
        path: ./

Stage 4: Deploy to Staging

Deploy to a production-like environment:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
deploy-staging:
  needs: [test, security]
  runs-on: ubuntu-latest
  environment: staging
  steps:
    - name: Download artifact
      uses: actions/download-artifact@v4
      with:
        name: build
    
    - name: Deploy to staging
      run: |
        aws s3 sync dist/ s3://staging-bucket/
        aws cloudfront create-invalidation \
          --distribution-id $STAGING_CF_ID \
          --paths "/*"
    
    - name: Run smoke tests
      run: |
        curl -sf https://staging.example.com/health
        npm run test:smoke -- --env=staging

Stage 5: Deploy to Production

With manual approval for critical systems:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
deploy-prod:
  needs: deploy-staging
  runs-on: ubuntu-latest
  environment: 
    name: production
    url: https://example.com
  steps:
    - name: Download artifact
      uses: actions/download-artifact@v4
      with:
        name: build
    
    - name: Deploy to production
      run: |
        aws s3 sync dist/ s3://prod-bucket/
        aws cloudfront create-invalidation \
          --distribution-id $PROD_CF_ID \
          --paths "/*"
    
    - name: Verify deployment
      run: |
        curl -sf https://example.com/health
        npm run test:smoke -- --env=production
    
    - name: Notify team
      run: |
        curl -X POST $SLACK_WEBHOOK \
          -d '{"text":"Deployed ${{ github.sha }} to production"}'

Deployment Strategies

Direct Deploy (All at Once)

Simple but risky:

1
2
- name: Deploy
  run: kubectl set image deployment/app app=$IMAGE

Rolling Deploy

Gradual replacement:

1
2
3
4
5
6
7
8
# Kubernetes rolling update
spec:
  replicas: 4
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1

Blue-Green Deploy

Instant switch with instant rollback:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
- name: Deploy to green
  run: |
    kubectl apply -f k8s/green/
    kubectl wait --for=condition=ready pod -l env=green
    
- name: Switch traffic
  run: |
    kubectl patch service app -p '{"spec":{"selector":{"env":"green"}}}'
    
- name: Verify
  run: ./scripts/smoke-test.sh
  
- name: Cleanup blue
  if: success()
  run: kubectl delete -f k8s/blue/

Canary Deploy

Gradual traffic shift:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
- name: Deploy canary (10%)
  run: |
    kubectl apply -f k8s/canary/
    kubectl patch service app --type=merge -p '
      {"spec":{"ports":[{"port":80,"targetPort":8080}]}}
    '
    # Configure ingress for 10% canary
    
- name: Monitor canary
  run: |
    sleep 300  # 5 minutes
    ./scripts/check-error-rate.sh canary
    
- name: Promote or rollback
  run: |
    if ./scripts/canary-healthy.sh; then
      kubectl scale deployment/app-canary --replicas=4
      kubectl scale deployment/app-stable --replicas=0
    else
      kubectl delete -f k8s/canary/
      exit 1
    fi

Pipeline Optimization

Parallelization

Run independent jobs concurrently:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
jobs:
  lint:
    runs-on: ubuntu-latest
    steps: [...]
    
  test-unit:
    runs-on: ubuntu-latest
    steps: [...]
    
  test-integration:
    runs-on: ubuntu-latest
    steps: [...]
    
  security:
    runs-on: ubuntu-latest
    steps: [...]
    
  # Only runs after all above pass
  deploy:
    needs: [lint, test-unit, test-integration, security]
    steps: [...]

Caching

Cache dependencies between runs:

1
2
3
4
5
6
7
8
- uses: actions/cache@v4
  with:
    path: |
      ~/.npm
      node_modules
    key: npm-${{ hashFiles('package-lock.json') }}
    restore-keys: |
      npm-

Skip Unnecessary Work

1
2
3
4
5
6
7
8
9
on:
  push:
    paths:
      - 'src/**'
      - 'package.json'
      - '.github/workflows/**'
    paths-ignore:
      - '**.md'
      - 'docs/**'

Fail Fast

Stop other jobs when one fails:

1
2
3
4
strategy:
  fail-fast: true
  matrix:
    node: [18, 20, 22]

Environment Protection

Required Reviewers

1
2
3
environment:
  name: production
  # Requires manual approval from designated reviewers

Configure in GitHub: Settings → Environments → production → Required reviewers

Branch Protection

  • Require PR reviews before merge
  • Require status checks to pass
  • Require linear history
  • Restrict who can push

Secrets Management

1
2
3
env:
  AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
  AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}

Use OIDC for cloud providers when possible:

1
2
3
4
5
- name: Configure AWS credentials
  uses: aws-actions/configure-aws-credentials@v4
  with:
    role-to-assume: arn:aws:iam::123456789:role/github-actions
    aws-region: us-east-1

Handling Failures

Automatic Rollback

1
2
3
4
5
6
7
- name: Deploy
  id: deploy
  run: ./deploy.sh
  
- name: Rollback on failure
  if: failure() && steps.deploy.outcome == 'failure'
  run: ./rollback.sh

Notifications

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
- name: Notify on failure
  if: failure()
  uses: slackapi/slack-github-action@v1
  with:
    channel-id: 'deployments'
    slack-message: |
      ❌ Deployment failed
      Commit: ${{ github.sha }}
      Author: ${{ github.actor }}
      <${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}|View logs>

Retry Logic

1
2
3
4
5
6
- name: Deploy with retry
  uses: nick-fields/retry@v2
  with:
    timeout_minutes: 10
    max_attempts: 3
    command: ./deploy.sh

Metrics to Track

Monitor your pipeline health:

  • Lead time: Commit to production
  • Deployment frequency: How often you deploy
  • Change failure rate: Percentage of deployments causing issues
  • Mean time to recovery: How fast you fix failures
1
2
3
4
5
6
7
8
9
- name: Record deployment metrics
  run: |
    curl -X POST $METRICS_ENDPOINT \
      -d '{
        "event": "deployment",
        "sha": "${{ github.sha }}",
        "environment": "production",
        "duration_seconds": ${{ steps.deploy.outputs.duration }}
      }'

Complete Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
name: CI/CD Pipeline

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  build:
    runs-on: ubuntu-latest
    outputs:
      image: ${{ steps.build.outputs.image }}
    steps:
      - uses: actions/checkout@v4
      - uses: docker/build-push-action@v5
        id: build
        with:
          push: ${{ github.event_name == 'push' }}
          tags: app:${{ github.sha }}

  test:
    needs: build
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm ci && npm test

  security:
    needs: build
    runs-on: ubuntu-latest
    steps:
      - uses: aquasecurity/trivy-action@master
        with:
          image-ref: app:${{ github.sha }}

  deploy-staging:
    if: github.event_name == 'push'
    needs: [test, security]
    runs-on: ubuntu-latest
    environment: staging
    steps:
      - run: kubectl set image deployment/app app=app:${{ github.sha }}

  deploy-prod:
    needs: deploy-staging
    runs-on: ubuntu-latest
    environment: production
    steps:
      - run: kubectl set image deployment/app app=app:${{ github.sha }}

A pipeline should be fast enough that developers don’t go get coffee, and reliable enough that they trust it. Build both.