Every junior developer learns that indexes make queries fast. What they don’t learn is that indexes also make writes slow, consume disk space, and can actively hurt performance when misused.
Let’s fix that.
What Indexes Actually Do
An index is a separate data structure that maintains a sorted copy of specific columns, with pointers back to the full rows. Think of it like the index in a book — instead of reading every page to find “PostgreSQL,” you flip to the index, find the entry, and jump directly to page 247.
Without an index, the database performs a sequential scan — reading every row to find matches. With an index, it performs an index scan — looking up values in the sorted structure, then fetching only the matching rows.
The tradeoff: every INSERT, UPDATE, or DELETE must also update the index. More indexes = slower writes.
The Basics: B-Tree Indexes
B-tree (balanced tree) is the default index type in most databases. It works for:
- Equality:
WHERE email = 'user@example.com' - Ranges:
WHERE created_at > '2024-01-01' - Sorting:
ORDER BY created_at DESC - Prefix matching:
WHERE name LIKE 'John%'
| |
Column Order in Composite Indexes
For a composite index on (a, b, c):
- ✅ Can use for queries on
a - ✅ Can use for queries on
a, b - ✅ Can use for queries on
a, b, c - ❌ Cannot efficiently use for queries on just
borc
Think of it like a phone book sorted by last name, then first name. You can quickly find all “Smiths” or “John Smith,” but finding all “Johns” requires scanning the whole book.
Rule of thumb: Put the most selective column first, or the column most commonly used alone.
Beyond B-Tree: Specialized Index Types
Hash Indexes
Fast for exact equality, nothing else:
| |
Use when you only ever do WHERE token = 'xxx' and never ranges or sorting. In practice, B-tree is usually fine.
GIN (Generalized Inverted Index)
Perfect for array columns, full-text search, and JSONB:
| |
GIN indexes are larger and slower to update than B-tree, but they enable queries that B-tree can’t handle.
GiST (Generalized Search Tree)
For geometric data, ranges, and nearest-neighbor searches:
| |
BRIN (Block Range Index)
Tiny indexes for naturally ordered data:
| |
BRIN stores min/max values per block of rows. If your data is physically ordered (like append-only logs sorted by timestamp), BRIN can be 100x smaller than B-tree while still being useful.
Catch: if data isn’t physically ordered, BRIN is useless.
Partial Indexes: Index Only What Matters
Why index rows you’ll never query?
| |
Partial indexes are smaller, faster to update, and faster to scan. Use them when your queries always include the same filter condition.
Covering Indexes: Avoid the Table Entirely
When a query can be answered entirely from the index, the database doesn’t need to fetch the actual row. This is called an index-only scan.
| |
The INCLUDE columns aren’t part of the search key but are stored in the index for retrieval. This can dramatically speed up queries that select a few extra columns.
Expression Indexes: Index Computed Values
When you query on transformed data, index the transformation:
| |
Without the expression index, the database can’t use an index on email for a query on LOWER(email).
When Indexes Hurt
Write-Heavy Workloads
Every index slows down INSERT, UPDATE, and DELETE. A table with 10 indexes means 10 extra write operations per row change.
For bulk imports, consider:
| |
Low-Selectivity Columns
An index on a boolean is_active column is usually useless. If 90% of rows are active, the database will just do a sequential scan anyway — it’s faster than bouncing between the index and the table.
Rule of thumb: if the index would return more than 10-15% of the table, it probably won’t be used.
Too Many Indexes
Each index:
- Consumes disk space
- Slows down writes
- Requires VACUUM maintenance
- Adds to backup size
Audit regularly. If an index hasn’t been used in months, consider dropping it.
| |
The Query Planner: Trust But Verify
The database query planner decides whether to use your index. Use EXPLAIN ANALYZE to see what it actually does:
| |
Look for:
Index ScanorIndex Only Scan— using the index ✅Seq Scan— sequential scan, index ignored ❌Bitmap Index Scan— hybrid approach, often fine ✅
If the planner ignores your index:
- Statistics might be stale:
ANALYZE users; - Query might not match index (wrong column order, missing expression)
- Table might be too small (seq scan is faster)
- Selectivity might be too low
Practical Index Strategy
- Start with no indexes (besides primary key)
- Monitor slow queries (pg_stat_statements, slow query log)
- Add indexes for specific problems (not speculatively)
- Verify with EXPLAIN ANALYZE
- Audit quarterly — remove unused indexes
The best index strategy isn’t the most indexes. It’s the right indexes for your actual query patterns, and nothing more.
Indexes are powerful, but they’re not free. Every index is a tradeoff between read speed and write speed, between query flexibility and storage cost. Use them deliberately, measure their impact, and don’t be afraid to drop them when they’re not helping.