Database migrations are where deploys go to die. Here’s how to make them safe.

The Golden Rule

Every migration must be backward compatible.

Your old code will run alongside the new schema during deployment. If it can’t, you’ll have downtime.

Safe Operations

These are always safe:

  • Adding a nullable column
  • Adding a table
  • Adding an index (with CONCURRENTLY)
  • Adding a constraint (as NOT VALID, then validate later)

Dangerous Operations

These need careful handling:

  • Dropping a column
  • Renaming a column
  • Adding a NOT NULL column
  • Changing a column type
  • Dropping a table

Pattern: Adding a Required Column

1
2
3
4
5
6
7
8
-- Step 1: Add nullable column (deploy)
ALTER TABLE users ADD COLUMN email VARCHAR(255);

-- Step 2: Backfill existing rows (script)
UPDATE users SET email = 'unknown@example.com' WHERE email IS NULL;

-- Step 3: Add constraint (next deploy)
ALTER TABLE users ALTER COLUMN email SET NOT NULL;

Three deploys, zero downtime.

Pattern: Renaming a Column

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
-- Step 1: Add new column
ALTER TABLE users ADD COLUMN full_name VARCHAR(255);

-- Step 2: Dual-write in application code
-- UPDATE users SET name = ?, full_name = ?

-- Step 3: Backfill
UPDATE users SET full_name = name WHERE full_name IS NULL;

-- Step 4: Switch reads to new column in app

-- Step 5: Drop old column (after app fully migrated)
ALTER TABLE users DROP COLUMN name;

Never rename directly. Copy, migrate, drop.

Pattern: Safe Index Creation

1
2
3
4
5
-- Bad: locks table for entire build
CREATE INDEX idx_users_email ON users(email);

-- Good: doesn't block writes
CREATE INDEX CONCURRENTLY idx_users_email ON users(email);

CONCURRENTLY takes longer but doesn’t lock.

Pattern: Adding Foreign Key

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
-- Bad: validates all rows, locks tables
ALTER TABLE orders ADD CONSTRAINT fk_user 
  FOREIGN KEY (user_id) REFERENCES users(id);

-- Good: add without validation, validate later
ALTER TABLE orders ADD CONSTRAINT fk_user 
  FOREIGN KEY (user_id) REFERENCES users(id) NOT VALID;

-- Then, during low traffic:
ALTER TABLE orders VALIDATE CONSTRAINT fk_user;

Migration Tools

Alembic (Python/SQLAlchemy)

1
2
3
4
5
6
# migrations/versions/001_add_email.py
def upgrade():
    op.add_column('users', sa.Column('email', sa.String(255)))

def downgrade():
    op.drop_column('users', 'email')
1
2
alembic upgrade head    # Apply
alembic downgrade -1    # Rollback

golang-migrate

1
2
3
4
5
-- 001_add_email.up.sql
ALTER TABLE users ADD COLUMN email VARCHAR(255);

-- 001_add_email.down.sql
ALTER TABLE users DROP COLUMN email;
1
2
migrate -path ./migrations -database $DB up
migrate -path ./migrations -database $DB down 1

Flyway

1
2
-- V1__add_email.sql
ALTER TABLE users ADD COLUMN email VARCHAR(255);

Flyway doesn’t support down migrations by default. Plan accordingly.

Testing Migrations

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
#!/bin/bash
# test_migrations.sh

# Start fresh database
docker run -d --name test_db postgres:15

# Apply all migrations
migrate up

# Run application tests
pytest tests/

# Apply down migrations
migrate down -all

# Apply up again (catches issues with down migrations)
migrate up

# Cleanup
docker rm -f test_db

Test the full up/down cycle.

Large Table Migrations

For tables with millions of rows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Batch updates to avoid long locks
batch_size = 1000
while True:
    result = db.execute("""
        UPDATE users 
        SET full_name = name 
        WHERE full_name IS NULL 
        LIMIT %s
    """, [batch_size])
    
    if result.rowcount == 0:
        break
    
    time.sleep(0.1)  # Let other queries through

The Pre-Deploy Checklist

  1. ☐ Migration is backward compatible
  2. ☐ Tested on production-size data
  3. ☐ Estimated runtime acceptable
  4. ☐ Down migration tested
  5. ☐ No table locks during peak hours
  6. ☐ Index creation uses CONCURRENTLY
  7. ☐ Constraints use NOT VALID if large table

Rollback Strategy

1
2
3
4
5
6
7
8
# Option 1: Down migration
migrate down 1

# Option 2: Deploy previous app version
# (if migration is backward compatible, old code still works)

# Option 3: Forward fix
# (deploy another migration that fixes the issue)

Forward fixes are often faster than rollbacks.

Schema Version in Code

1
2
3
4
5
6
7
EXPECTED_SCHEMA_VERSION = "20260228_001"

def check_schema():
    current = db.query("SELECT version FROM schema_migrations ORDER BY version DESC LIMIT 1")
    if current != EXPECTED_SCHEMA_VERSION:
        log.error(f"Schema mismatch: expected {EXPECTED_SCHEMA_VERSION}, got {current}")
        sys.exit(1)

Fail fast if schema doesn’t match code.

The Philosophy

Migrations are operations, not just code. Treat them with the same care as any production change:

  • Plan the rollback before you start
  • Have a monitoring dashboard open
  • Run during low-traffic periods
  • Communicate with the team

Your database is the hardest thing to roll back. Respect it.