Database migrations are where deploys go to die. Here’s how to make them safe.
The Golden Rule#
Every migration must be backward compatible.
Your old code will run alongside the new schema during deployment. If it can’t, you’ll have downtime.
Safe Operations#
These are always safe:
- Adding a nullable column
- Adding a table
- Adding an index (with
CONCURRENTLY) - Adding a constraint (as
NOT VALID, then validate later)
Dangerous Operations#
These need careful handling:
- Dropping a column
- Renaming a column
- Adding a NOT NULL column
- Changing a column type
- Dropping a table
Pattern: Adding a Required Column#
1
2
3
4
5
6
7
8
| -- Step 1: Add nullable column (deploy)
ALTER TABLE users ADD COLUMN email VARCHAR(255);
-- Step 2: Backfill existing rows (script)
UPDATE users SET email = 'unknown@example.com' WHERE email IS NULL;
-- Step 3: Add constraint (next deploy)
ALTER TABLE users ALTER COLUMN email SET NOT NULL;
|
Three deploys, zero downtime.
Pattern: Renaming a Column#
1
2
3
4
5
6
7
8
9
10
11
12
13
| -- Step 1: Add new column
ALTER TABLE users ADD COLUMN full_name VARCHAR(255);
-- Step 2: Dual-write in application code
-- UPDATE users SET name = ?, full_name = ?
-- Step 3: Backfill
UPDATE users SET full_name = name WHERE full_name IS NULL;
-- Step 4: Switch reads to new column in app
-- Step 5: Drop old column (after app fully migrated)
ALTER TABLE users DROP COLUMN name;
|
Never rename directly. Copy, migrate, drop.
Pattern: Safe Index Creation#
1
2
3
4
5
| -- Bad: locks table for entire build
CREATE INDEX idx_users_email ON users(email);
-- Good: doesn't block writes
CREATE INDEX CONCURRENTLY idx_users_email ON users(email);
|
CONCURRENTLY takes longer but doesn’t lock.
Pattern: Adding Foreign Key#
1
2
3
4
5
6
7
8
9
10
| -- Bad: validates all rows, locks tables
ALTER TABLE orders ADD CONSTRAINT fk_user
FOREIGN KEY (user_id) REFERENCES users(id);
-- Good: add without validation, validate later
ALTER TABLE orders ADD CONSTRAINT fk_user
FOREIGN KEY (user_id) REFERENCES users(id) NOT VALID;
-- Then, during low traffic:
ALTER TABLE orders VALIDATE CONSTRAINT fk_user;
|
Alembic (Python/SQLAlchemy)#
1
2
3
4
5
6
| # migrations/versions/001_add_email.py
def upgrade():
op.add_column('users', sa.Column('email', sa.String(255)))
def downgrade():
op.drop_column('users', 'email')
|
1
2
| alembic upgrade head # Apply
alembic downgrade -1 # Rollback
|
golang-migrate#
1
2
3
4
5
| -- 001_add_email.up.sql
ALTER TABLE users ADD COLUMN email VARCHAR(255);
-- 001_add_email.down.sql
ALTER TABLE users DROP COLUMN email;
|
1
2
| migrate -path ./migrations -database $DB up
migrate -path ./migrations -database $DB down 1
|
Flyway#
1
2
| -- V1__add_email.sql
ALTER TABLE users ADD COLUMN email VARCHAR(255);
|
Flyway doesn’t support down migrations by default. Plan accordingly.
Testing Migrations#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
| #!/bin/bash
# test_migrations.sh
# Start fresh database
docker run -d --name test_db postgres:15
# Apply all migrations
migrate up
# Run application tests
pytest tests/
# Apply down migrations
migrate down -all
# Apply up again (catches issues with down migrations)
migrate up
# Cleanup
docker rm -f test_db
|
Test the full up/down cycle.
Large Table Migrations#
For tables with millions of rows:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| # Batch updates to avoid long locks
batch_size = 1000
while True:
result = db.execute("""
UPDATE users
SET full_name = name
WHERE full_name IS NULL
LIMIT %s
""", [batch_size])
if result.rowcount == 0:
break
time.sleep(0.1) # Let other queries through
|
The Pre-Deploy Checklist#
- ☐ Migration is backward compatible
- ☐ Tested on production-size data
- ☐ Estimated runtime acceptable
- ☐ Down migration tested
- ☐ No table locks during peak hours
- ☐ Index creation uses CONCURRENTLY
- ☐ Constraints use NOT VALID if large table
Rollback Strategy#
1
2
3
4
5
6
7
8
| # Option 1: Down migration
migrate down 1
# Option 2: Deploy previous app version
# (if migration is backward compatible, old code still works)
# Option 3: Forward fix
# (deploy another migration that fixes the issue)
|
Forward fixes are often faster than rollbacks.
Schema Version in Code#
1
2
3
4
5
6
7
| EXPECTED_SCHEMA_VERSION = "20260228_001"
def check_schema():
current = db.query("SELECT version FROM schema_migrations ORDER BY version DESC LIMIT 1")
if current != EXPECTED_SCHEMA_VERSION:
log.error(f"Schema mismatch: expected {EXPECTED_SCHEMA_VERSION}, got {current}")
sys.exit(1)
|
Fail fast if schema doesn’t match code.
The Philosophy#
Migrations are operations, not just code. Treat them with the same care as any production change:
- Plan the rollback before you start
- Have a monitoring dashboard open
- Run during low-traffic periods
- Communicate with the team
Your database is the hardest thing to roll back. Respect it.