Schema changes are scary. One bad migration can take down your entire application. Here’s how to evolve your database safely, even with users actively hitting it.
The Core Problem
You need to rename a column. Sounds simple:
| |
But your application is running. The moment this executes:
- Old code looking for
namebreaks - New code looking for
full_namealso breaks (until deploy finishes) - You have a window of guaranteed failures
This is why database migrations need careful choreography.
The Expand-Contract Pattern
The safest approach: never make destructive changes directly. Instead:
- Expand — Add new structure alongside old
- Migrate — Move data, update code
- Contract — Remove old structure
Example: Renaming a Column
Step 1: Expand (add new column)
| |
Both columns exist. Old code works. New code can start using full_name.
Step 2: Dual-write (application change)
| |
Deploy this. All new writes go to both columns.
Step 3: Backfill
| |
Now all rows have full_name populated.
Step 4: Switch reads (application change)
| |
Deploy this. Old column is now unused.
Step 5: Contract (remove old column)
| |
Done. Zero downtime through the entire process.
Migrations That Lock Tables
Some operations lock the entire table, blocking all reads and writes:
Dangerous operations:
ALTER TABLE ... ADD COLUMN ... DEFAULT value(pre-Postgres 11)CREATE INDEXwithoutCONCURRENTLYALTER TABLE ... ALTER COLUMN ... TYPE(data conversion)- Large
UPDATEstatements
Safe Index Creation
| |
CONCURRENTLY takes longer but doesn’t block writes.
Safe Column Additions (Postgres 11+)
| |
Batched Updates
Large updates should be batched:
| |
Migration Tooling
Framework Migrations
Most frameworks have built-in migrations:
| |
| |
Standalone Tools
For more control:
Flyway — SQL-based, version controlled
golang-migrate — Language agnostic
| |
gh-ost (MySQL) — Online schema changes
| |
Deployment Ordering
The order matters:
Adding a Column
- Deploy migration (add column)
- Deploy application (code that uses new column)
Removing a Column
- Deploy application (remove all references to column)
- Deploy migration (drop column)
Renaming (Expand-Contract)
- Deploy migration (add new column)
- Deploy application (dual-write)
- Deploy migration (backfill)
- Deploy application (read from new)
- Deploy migration (drop old column)
Testing Migrations
Local Testing
| |
Staging with Production Data
For critical migrations, test against production-scale data:
| |
Dry Runs
Some tools support dry runs:
| |
Rollback Strategies
Reversible Migrations
Always write down migrations:
| |
Forward-Only
Some teams go forward-only — no rollbacks, only new migrations that fix issues:
Safer than rollbacks that might lose data.
Feature Flags
Combine with feature flags for instant rollback:
| |
Flip the flag if something’s wrong — no deployment needed.
Monitoring Migrations
Before Running
| |
During Migration
| |
After Migration
- Check error rates in application
- Verify query performance (new indexes working?)
- Monitor replication lag
Common Mistakes
1. Not Testing at Scale
A migration that takes 1 second on 1000 rows might take 1 hour on 10 million.
2. Running During Peak Hours
Schedule migrations during low-traffic windows. Even “safe” operations have overhead.
3. No Backout Plan
Know how to reverse before you start. Have the rollback migration ready.
4. Mixing Migration and Application Changes
Keep them separate. Migration first (or last), application separate.
5. Ignoring Replication Lag
In replicated setups, reads might hit replicas that haven’t caught up. Plan for temporary inconsistency.
The Bottom Line
- Never make breaking changes directly — Use expand-contract
- Use CONCURRENTLY for indexes — Avoid table locks
- Batch large updates — Don’t lock millions of rows
- Test at production scale — Small-data tests lie
- Deploy in order — Migration timing matters
- Monitor everything — Locks, errors, performance
Your database is your most valuable asset. Treat migrations with the respect they deserve.
Databases outlive applications. Migrate carefully, and your data will thank you.