Database Migrations Without Downtime

Database migrations are the scariest part of deployments. Get them wrong and you’re looking at downtime, data loss, or a 3 AM incident call.

Here’s how to migrate safely.

The Problem

Naive migrations cause problems:

1
2
3
4
5
6
-- This locks the entire table
ALTER TABLE users ADD COLUMN phone VARCHAR(20) NOT NULL;

-- 10 million rows, exclusive lock held for minutes
-- Application queries queue up
-- Users see errors

Safe Migration Patterns

Adding Columns

Bad: Adding NOT NULL column without default

1
2
-- Blocks reads and writes on large tables
ALTER TABLE users ADD COLUMN phone VARCHAR(20) NOT NULL;

Good: Add nullable, then backfill, then add constraint

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
-- Step 1: Add nullable column (instant, no lock)
ALTER TABLE users ADD COLUMN phone VARCHAR(20);

-- Step 2: Backfill in batches (no lock)
UPDATE users SET phone = 'unknown' 
WHERE phone IS NULL AND id BETWEEN 1 AND 10000;
-- Repeat for all batches

-- Step 3: Add NOT NULL constraint (fast check)
ALTER TABLE users ALTER COLUMN phone SET NOT NULL;

Removing Columns

Bad: Drop column while old code still uses it

1
2
ALTER TABLE users DROP COLUMN legacy_field;
-- Old application instances crash

Good: Three-phase removal

1
2
-- Only after all code is updated
ALTER TABLE users DROP COLUMN legacy_field;

Renaming Columns

Bad: Direct rename breaks existing code

1
ALTER TABLE users RENAME COLUMN name TO full_name;

Good: Add new, migrate, remove old

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
-- Step 1: Add new column
ALTER TABLE users ADD COLUMN full_name VARCHAR(100);

-- Step 2: Copy data
UPDATE users SET full_name = name WHERE full_name IS NULL;

-- Step 3: Deploy code using both columns (reads from full_name, writes to both)

-- Step 4: Deploy code using only full_name

-- Step 5: Drop old column
ALTER TABLE users DROP COLUMN name;

Adding Indexes

Bad: Regular CREATE INDEX locks writes

1
2
CREATE INDEX idx_users_email ON users(email);
-- Table locked for minutes during index build

Good: CONCURRENTLY doesn’t lock

1
2
3
4
5
-- PostgreSQL
CREATE INDEX CONCURRENTLY idx_users_email ON users(email);

-- Takes longer but doesn't block
-- Check for completion: SELECT * FROM pg_stat_progress_create_index;

Gotcha: CONCURRENTLY can fail and leave an invalid index:

1
2
3
4
5
6
7
-- Check for invalid indexes
SELECT indexrelid::regclass, indisvalid 
FROM pg_index WHERE NOT indisvalid;

-- If found, drop and recreate
DROP INDEX CONCURRENTLY idx_users_email;
CREATE INDEX CONCURRENTLY idx_users_email ON users(email);

Adding Foreign Keys

Bad: Adding FK validates all existing data with lock

1
2
3
4
ALTER TABLE orders 
ADD CONSTRAINT fk_orders_user 
FOREIGN KEY (user_id) REFERENCES users(id);
-- Scans entire table with lock held

Good: Add NOT VALID, then validate separately

1
2
3
4
5
6
7
-- Step 1: Add constraint without validation (instant)
ALTER TABLE orders 
ADD CONSTRAINT fk_orders_user 
FOREIGN KEY (user_id) REFERENCES users(id) NOT VALID;

-- Step 2: Validate existing data (no lock on writes)
ALTER TABLE orders VALIDATE CONSTRAINT fk_orders_user;

Adding NOT NULL Constraints

Bad: Adding NOT NULL scans and locks

1
2
ALTER TABLE users ALTER COLUMN email SET NOT NULL;
-- Scans all rows while holding lock

Good: Add check constraint first (PostgreSQL 12+)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
-- Step 1: Add check constraint as NOT VALID
ALTER TABLE users 
ADD CONSTRAINT users_email_not_null 
CHECK (email IS NOT NULL) NOT VALID;

-- Step 2: Validate (no lock on writes)
ALTER TABLE users VALIDATE CONSTRAINT users_email_not_null;

-- Step 3: Convert to NOT NULL (instant, constraint already validated)
ALTER TABLE users ALTER COLUMN email SET NOT NULL;

-- Step 4: Drop redundant check constraint
ALTER TABLE users DROP CONSTRAINT users_email_not_null;

Migration Tools

Rails ActiveRecord

1
2
3
4
5
6
7
8
class AddPhoneToUsers < ActiveRecord::Migration[7.0]
  disable_ddl_transaction!  # Required for CONCURRENTLY
  
  def change
    add_column :users, :phone, :string
    add_index :users, :phone, algorithm: :concurrently
  end
end

Django

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# migrations/0042_add_phone.py
from django.db import migrations

class Migration(migrations.Migration):
    atomic = False  # Disable transaction for concurrent ops
    
    operations = [
        migrations.AddField(
            model_name='user',
            name='phone',
            field=models.CharField(max_length=20, null=True),
        ),
        migrations.RunSQL(
            "CREATE INDEX CONCURRENTLY idx_phone ON users(phone);",
            reverse_sql="DROP INDEX CONCURRENTLY idx_phone;"
        ),
    ]

Flyway / Liquibase

1
2
3
-- V42__add_phone.sql
ALTER TABLE users ADD COLUMN phone VARCHAR(20);
CREATE INDEX CONCURRENTLY idx_users_phone ON users(phone);

gh-ost (MySQL)

For MySQL, online schema changes are harder. Use gh-ost:

1
2
3
4
5
6
gh-ost \
  --host=replica.db.example.com \
  --database=myapp \
  --table=users \
  --alter="ADD COLUMN phone VARCHAR(20)" \
  --execute

gh-ost creates a shadow table, copies data in chunks, then atomically swaps.

Deployment Order

The order matters for zero-downtime:

Adding a Column

Removing a Column

Renaming a Table

Lock Monitoring

Know when migrations are blocking:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
-- PostgreSQL: Find blocking queries
SELECT 
  blocked.pid AS blocked_pid,
  blocked.query AS blocked_query,
  blocking.pid AS blocking_pid,
  blocking.query AS blocking_query
FROM pg_stat_activity blocked
JOIN pg_locks blocked_locks ON blocked.pid = blocked_locks.pid
JOIN pg_locks blocking_locks ON blocked_locks.locktype = blocking_locks.locktype
  AND blocked_locks.relation = blocking_locks.relation
  AND blocked_locks.pid != blocking_locks.pid
JOIN pg_stat_activity blocking ON blocking_locks.pid = blocking.pid
WHERE NOT blocked_locks.granted;

Set lock timeout to fail fast:

1
2
3
SET lock_timeout = '5s';
ALTER TABLE users ADD COLUMN phone VARCHAR(20);
-- Fails after 5 seconds if can't acquire lock

Testing Migrations

Test on production-size data:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Dump prod schema + sample data
pg_dump --schema-only prod_db > schema.sql
pg_dump --data-only --table=users prod_db | head -100000 > sample.sql

# Restore to test DB
psql test_db < schema.sql
psql test_db < sample.sql

# Time the migration
time psql test_db < migration.sql

Rollback Strategy

Always have a rollback plan:

1
2
3
4
5
-- migration_up.sql
ALTER TABLE users ADD COLUMN phone VARCHAR(20);

-- migration_down.sql
ALTER TABLE users DROP COLUMN phone;

But remember: some migrations can’t be rolled back easily (data transformations, column removals after data loss).

For risky migrations:

Take a backup before
Test rollback in staging
Have the rollback script ready
Set a point-of-no-return after which you don’t roll back

Quick Reference

Operation	Safe Method
Add column	Add nullable, then backfill
Remove column	Stop using first, then drop
Rename column	Add new, migrate, drop old
Add index	CREATE INDEX CONCURRENTLY
Add foreign key	Add NOT VALID, then validate
Add NOT NULL	Add check constraint first
Change type	Add new column, migrate, drop old

The best migration is one your users don’t notice. Make each change small, reversible, and independent of deployment order.

The Problem#

Safe Migration Patterns#

Adding Columns#

Removing Columns#

Renaming Columns#

Adding Indexes#

Adding Foreign Keys#

Adding NOT NULL Constraints#

Migration Tools#

Rails ActiveRecord#

Django#

Flyway / Liquibase#

gh-ost (MySQL)#

Deployment Order#

Adding a Column#

Removing a Column#

Renaming a Table#

Lock Monitoring#

Testing Migrations#

Rollback Strategy#

Quick Reference#

📬 Get the Newsletter