Database Indexing: What Every Developer Should Know

Your query is slow. You add an index. It gets faster. Magic, right?

Not quite. Indexes are powerful but misunderstood. Used well, they turn seconds into milliseconds. Used poorly, they slow down writes, waste storage, and sometimes make queries slower.

Let’s demystify them.

What Is an Index?

An index is a separate data structure that helps the database find rows without scanning the entire table. Think of it like a book’s index—instead of reading every page to find “PostgreSQL,” you look it up in the index and jump directly to page 247.

Without an index:

1
2
SELECT * FROM users WHERE email = 'alice@example.com';
-- Database scans all 10 million rows: ~5 seconds

With an index on email:

1
2
SELECT * FROM users WHERE email = 'alice@example.com';
-- Database looks up index, jumps to row: ~5 milliseconds

How Indexes Work (B-Trees)

Most indexes use B-tree structures. Picture a balanced tree where:

The root node contains ranges pointing to child nodes
Internal nodes narrow down the search
Leaf nodes contain pointers to actual table rows

Looking up “email = ‘alice@example.com’”:

Start at root: “alice” < “M” → go left
At [D-L]: “alice” < “D” → go left
At [A-C]: find “alice” → follow pointer to row

This is O(log n) instead of O(n). For 10 million rows, that’s ~24 comparisons instead of 10 million.

Creating Indexes

Basic syntax:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
-- Single column index
CREATE INDEX idx_users_email ON users(email);

-- Multi-column (composite) index
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);

-- Unique index (enforces uniqueness)
CREATE UNIQUE INDEX idx_users_email_unique ON users(email);

-- Partial index (only indexes subset of rows)
CREATE INDEX idx_orders_pending ON orders(created_at) 
WHERE status = 'pending';

The Column Order Trap

For composite indexes, column order matters:

1
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);

This index helps:

1
2
3
4
5
6
7
8
-- ✅ Uses index (leftmost column)
SELECT * FROM orders WHERE user_id = 123;

-- ✅ Uses index (both columns, left to right)
SELECT * FROM orders WHERE user_id = 123 AND created_at > '2026-01-01';

-- ✅ Uses index for user_id, then scans for date range
SELECT * FROM orders WHERE user_id = 123 AND created_at BETWEEN '2026-01-01' AND '2026-02-01';

This index does NOT help:

1
2
3
4
5
-- ❌ Cannot use index (missing leftmost column)
SELECT * FROM orders WHERE created_at > '2026-01-01';

-- ❌ Cannot use index efficiently
SELECT * FROM orders WHERE created_at > '2026-01-01' AND user_id = 123;

Rule of thumb: Put equality conditions first, range conditions last.

Reading Execution Plans

Always verify your indexes work with EXPLAIN:

1
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'alice@example.com';

Good output (using index):

Bad output (sequential scan):

Why Indexes Get Ignored

The query planner might skip your index if:

Table is small: Sequential scan is faster for tiny tables
Low selectivity: If 80% of rows match, scanning is faster than index lookups
Type mismatch: WHERE id = '123' when id is integer
Function on column: WHERE LOWER(email) = 'alice@example.com'

Fix function issues with expression indexes:

1
CREATE INDEX idx_users_email_lower ON users(LOWER(email));

Index Types Beyond B-Tree

Hash Indexes

Fast for equality, useless for ranges:

1
2
3
CREATE INDEX idx_users_email_hash ON users USING hash(email);
-- Good for: WHERE email = 'x'
-- Bad for: WHERE email LIKE 'a%'

GIN (Generalized Inverted Index)

For arrays, JSONB, full-text search:

1
2
3
4
5
6
-- Index JSONB fields
CREATE INDEX idx_products_tags ON products USING gin(tags);
SELECT * FROM products WHERE tags @> '["electronics"]';

-- Full-text search
CREATE INDEX idx_posts_search ON posts USING gin(to_tsvector('english', title || ' ' || body));

GiST (Generalized Search Tree)

For geometric data, ranges, nearest-neighbor:

1
2
CREATE INDEX idx_locations_point ON locations USING gist(coordinates);
SELECT * FROM locations ORDER BY coordinates <-> point(40.7128, -74.0060) LIMIT 10;

The Write Penalty

Every index slows down writes. When you INSERT, UPDATE, or DELETE:

Modify the table row
Update every index on that table

1
2
3
-- Table with 5 indexes: every INSERT updates 6 data structures
INSERT INTO orders (user_id, product_id, quantity, price, status)
VALUES (123, 456, 2, 29.99, 'pending');

Measure the tradeoff:

Read-heavy workload (100:1 read:write)? Index liberally
Write-heavy workload (1:10 read:write)? Index sparingly

Finding Missing Indexes

PostgreSQL

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
-- Find slow queries
SELECT query, calls, mean_time, total_time
FROM pg_stat_statements
ORDER BY mean_time DESC
LIMIT 20;

-- Find sequential scans on large tables
SELECT relname, seq_scan, seq_tup_read, idx_scan, idx_tup_fetch
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_tup_read DESC;

MySQL

1
2
3
4
5
6
-- Enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1;

-- Check for full table scans
SHOW STATUS LIKE 'Select_scan';

Finding Unused Indexes

Indexes you never use still cost write performance:

1
2
3
4
5
-- PostgreSQL: find unused indexes
SELECT indexrelname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexrelname NOT LIKE '%_pkey';

Before dropping, verify across a full business cycle (daily, weekly, monthly queries).

Common Anti-Patterns

Over-Indexing

1
2
3
4
5
-- Don't do this
CREATE INDEX idx_1 ON users(email);
CREATE INDEX idx_2 ON users(email, name);
CREATE INDEX idx_3 ON users(email, name, created_at);
-- idx_2 and idx_3 can serve queries that idx_1 would serve

Indexing Low-Cardinality Columns

1
2
3
4
5
-- Mostly useless: boolean has only 2 values
CREATE INDEX idx_users_active ON users(is_active);

-- Better: partial index
CREATE INDEX idx_users_inactive ON users(created_at) WHERE is_active = false;

Forgetting to ANALYZE

1
2
3
-- After bulk inserts, update statistics
ANALYZE users;
-- Otherwise planner uses stale statistics

Index Maintenance

Indexes fragment over time. Rebuild periodically:

1
2
3
4
5
6
7
-- PostgreSQL
REINDEX INDEX idx_users_email;
-- Or for whole table
REINDEX TABLE users;

-- Non-blocking rebuild (PostgreSQL 12+)
REINDEX INDEX CONCURRENTLY idx_users_email;

Quick Reference

Situation	Index Strategy
Exact match lookup	B-tree on column
Range queries	B-tree, range column last
Sorting	Index matching ORDER BY
LIKE ‘prefix%’	B-tree works
LIKE ‘%suffix’	Need trigram/full-text
JSONB queries	GIN index
Geospatial	GiST index
High-write table	Fewer, targeted indexes

Indexes are a trade-off. Every fast read costs a slower write. Profile, measure, and index with intention.

What Is an Index?#

How Indexes Work (B-Trees)#

Creating Indexes#

The Column Order Trap#

Reading Execution Plans#

Why Indexes Get Ignored#

Index Types Beyond B-Tree#

Hash Indexes#

GIN (Generalized Inverted Index)#

GiST (Generalized Search Tree)#

The Write Penalty#

Finding Missing Indexes#

PostgreSQL#

MySQL#

Finding Unused Indexes#

Common Anti-Patterns#

Over-Indexing#

Indexing Low-Cardinality Columns#

Forgetting to ANALYZE#

Index Maintenance#

Quick Reference#

📬 Get the Newsletter