Your query is slow. You add an index. It gets faster. Magic, right?

Not quite. Indexes are powerful but misunderstood. Used well, they turn seconds into milliseconds. Used poorly, they slow down writes, waste storage, and sometimes make queries slower.

Let’s demystify them.

What Is an Index?

An index is a separate data structure that helps the database find rows without scanning the entire table. Think of it like a book’s index—instead of reading every page to find “PostgreSQL,” you look it up in the index and jump directly to page 247.

Without an index:

1
2
SELECT * FROM users WHERE email = 'alice@example.com';
-- Database scans all 10 million rows: ~5 seconds

With an index on email:

1
2
SELECT * FROM users WHERE email = 'alice@example.com';
-- Database looks up index, jumps to row: ~5 milliseconds

How Indexes Work (B-Trees)

Most indexes use B-tree structures. Picture a balanced tree where:

  1. The root node contains ranges pointing to child nodes
  2. Internal nodes narrow down the search
  3. Leaf nodes contain pointers to actual table rows
[rAo-wCs[]D[-DrL-o]Fw]s[[MGr]-oLw[]sN[-NrZ-o]Rw]s[Sr-oZw]s

Looking up “email = ‘alice@example.com’”:

  1. Start at root: “alice” < “M” → go left
  2. At [D-L]: “alice” < “D” → go left
  3. At [A-C]: find “alice” → follow pointer to row

This is O(log n) instead of O(n). For 10 million rows, that’s ~24 comparisons instead of 10 million.

Creating Indexes

Basic syntax:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
-- Single column index
CREATE INDEX idx_users_email ON users(email);

-- Multi-column (composite) index
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);

-- Unique index (enforces uniqueness)
CREATE UNIQUE INDEX idx_users_email_unique ON users(email);

-- Partial index (only indexes subset of rows)
CREATE INDEX idx_orders_pending ON orders(created_at) 
WHERE status = 'pending';

The Column Order Trap

For composite indexes, column order matters:

1
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);

This index helps:

1
2
3
4
5
6
7
8
-- ✅ Uses index (leftmost column)
SELECT * FROM orders WHERE user_id = 123;

-- ✅ Uses index (both columns, left to right)
SELECT * FROM orders WHERE user_id = 123 AND created_at > '2026-01-01';

-- ✅ Uses index for user_id, then scans for date range
SELECT * FROM orders WHERE user_id = 123 AND created_at BETWEEN '2026-01-01' AND '2026-02-01';

This index does NOT help:

1
2
3
4
5
-- ❌ Cannot use index (missing leftmost column)
SELECT * FROM orders WHERE created_at > '2026-01-01';

-- ❌ Cannot use index efficiently
SELECT * FROM orders WHERE created_at > '2026-01-01' AND user_id = 123;

Rule of thumb: Put equality conditions first, range conditions last.

Reading Execution Plans

Always verify your indexes work with EXPLAIN:

1
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'alice@example.com';

Good output (using index):

IndIRAenocxdwtesuSxacRlaCenomtnoiudvms:eeid:n(geb0my.ia0diF2xli5_l.u=t.se0e'r.ra:0sl2_i07ecmema@sielxaomnplues.ecrosm')

Bad output (sequential scan):

SeqFRAiocSlwtctsuaeanrRl:eomtn(oievmumeesad:eirlb4sy5=2F1'i.al1lt2ie3cr.e:.@4e95x92a91m9.p94l95e96.cmosm')

Why Indexes Get Ignored

The query planner might skip your index if:

  1. Table is small: Sequential scan is faster for tiny tables
  2. Low selectivity: If 80% of rows match, scanning is faster than index lookups
  3. Type mismatch: WHERE id = '123' when id is integer
  4. Function on column: WHERE LOWER(email) = 'alice@example.com'

Fix function issues with expression indexes:

1
CREATE INDEX idx_users_email_lower ON users(LOWER(email));

Index Types Beyond B-Tree

Hash Indexes

Fast for equality, useless for ranges:

1
2
3
CREATE INDEX idx_users_email_hash ON users USING hash(email);
-- Good for: WHERE email = 'x'
-- Bad for: WHERE email LIKE 'a%'

GIN (Generalized Inverted Index)

For arrays, JSONB, full-text search:

1
2
3
4
5
6
-- Index JSONB fields
CREATE INDEX idx_products_tags ON products USING gin(tags);
SELECT * FROM products WHERE tags @> '["electronics"]';

-- Full-text search
CREATE INDEX idx_posts_search ON posts USING gin(to_tsvector('english', title || ' ' || body));

GiST (Generalized Search Tree)

For geometric data, ranges, nearest-neighbor:

1
2
CREATE INDEX idx_locations_point ON locations USING gist(coordinates);
SELECT * FROM locations ORDER BY coordinates <-> point(40.7128, -74.0060) LIMIT 10;

The Write Penalty

Every index slows down writes. When you INSERT, UPDATE, or DELETE:

  1. Modify the table row
  2. Update every index on that table
1
2
3
-- Table with 5 indexes: every INSERT updates 6 data structures
INSERT INTO orders (user_id, product_id, quantity, price, status)
VALUES (123, 456, 2, 29.99, 'pending');

Measure the tradeoff:

  • Read-heavy workload (100:1 read:write)? Index liberally
  • Write-heavy workload (1:10 read:write)? Index sparingly

Finding Missing Indexes

PostgreSQL

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
-- Find slow queries
SELECT query, calls, mean_time, total_time
FROM pg_stat_statements
ORDER BY mean_time DESC
LIMIT 20;

-- Find sequential scans on large tables
SELECT relname, seq_scan, seq_tup_read, idx_scan, idx_tup_fetch
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_tup_read DESC;

MySQL

1
2
3
4
5
6
-- Enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1;

-- Check for full table scans
SHOW STATUS LIKE 'Select_scan';

Finding Unused Indexes

Indexes you never use still cost write performance:

1
2
3
4
5
-- PostgreSQL: find unused indexes
SELECT indexrelname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexrelname NOT LIKE '%_pkey';

Before dropping, verify across a full business cycle (daily, weekly, monthly queries).

Common Anti-Patterns

Over-Indexing

1
2
3
4
5
-- Don't do this
CREATE INDEX idx_1 ON users(email);
CREATE INDEX idx_2 ON users(email, name);
CREATE INDEX idx_3 ON users(email, name, created_at);
-- idx_2 and idx_3 can serve queries that idx_1 would serve

Indexing Low-Cardinality Columns

1
2
3
4
5
-- Mostly useless: boolean has only 2 values
CREATE INDEX idx_users_active ON users(is_active);

-- Better: partial index
CREATE INDEX idx_users_inactive ON users(created_at) WHERE is_active = false;

Forgetting to ANALYZE

1
2
3
-- After bulk inserts, update statistics
ANALYZE users;
-- Otherwise planner uses stale statistics

Index Maintenance

Indexes fragment over time. Rebuild periodically:

1
2
3
4
5
6
7
-- PostgreSQL
REINDEX INDEX idx_users_email;
-- Or for whole table
REINDEX TABLE users;

-- Non-blocking rebuild (PostgreSQL 12+)
REINDEX INDEX CONCURRENTLY idx_users_email;

Quick Reference

SituationIndex Strategy
Exact match lookupB-tree on column
Range queriesB-tree, range column last
SortingIndex matching ORDER BY
LIKE ‘prefix%’B-tree works
LIKE ‘%suffix’Need trigram/full-text
JSONB queriesGIN index
GeospatialGiST index
High-write tableFewer, targeted indexes

Indexes are a trade-off. Every fast read costs a slower write. Profile, measure, and index with intention.