Every junior developer learns that indexes make queries fast. What they don’t learn is that indexes also make writes slow, consume disk space, and can actively hurt performance when misused.

Let’s fix that.

What Indexes Actually Do

An index is a separate data structure that maintains a sorted copy of specific columns, with pointers back to the full rows. Think of it like the index in a book — instead of reading every page to find “PostgreSQL,” you flip to the index, find the entry, and jump directly to page 247.

Without an index, the database performs a sequential scan — reading every row to find matches. With an index, it performs an index scan — looking up values in the sorted structure, then fetching only the matching rows.

The tradeoff: every INSERT, UPDATE, or DELETE must also update the index. More indexes = slower writes.

The Basics: B-Tree Indexes

B-tree (balanced tree) is the default index type in most databases. It works for:

  • Equality: WHERE email = 'user@example.com'
  • Ranges: WHERE created_at > '2024-01-01'
  • Sorting: ORDER BY created_at DESC
  • Prefix matching: WHERE name LIKE 'John%'
1
2
3
4
5
6
7
8
-- Basic index
CREATE INDEX idx_users_email ON users(email);

-- Composite index (column order matters!)
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);

-- Unique index (also enforces constraint)
CREATE UNIQUE INDEX idx_users_email_unique ON users(email);

Column Order in Composite Indexes

For a composite index on (a, b, c):

  • ✅ Can use for queries on a
  • ✅ Can use for queries on a, b
  • ✅ Can use for queries on a, b, c
  • ❌ Cannot efficiently use for queries on just b or c

Think of it like a phone book sorted by last name, then first name. You can quickly find all “Smiths” or “John Smith,” but finding all “Johns” requires scanning the whole book.

Rule of thumb: Put the most selective column first, or the column most commonly used alone.

Beyond B-Tree: Specialized Index Types

Hash Indexes

Fast for exact equality, nothing else:

1
CREATE INDEX idx_sessions_token ON sessions USING hash(token);

Use when you only ever do WHERE token = 'xxx' and never ranges or sorting. In practice, B-tree is usually fine.

GIN (Generalized Inverted Index)

Perfect for array columns, full-text search, and JSONB:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
-- Array containment
CREATE INDEX idx_posts_tags ON posts USING gin(tags);
-- Query: WHERE tags @> ARRAY['python', 'postgres']

-- JSONB paths
CREATE INDEX idx_events_data ON events USING gin(data jsonb_path_ops);
-- Query: WHERE data @> '{"type": "click"}'

-- Full-text search
CREATE INDEX idx_articles_search ON articles USING gin(to_tsvector('english', title || ' ' || body));

GIN indexes are larger and slower to update than B-tree, but they enable queries that B-tree can’t handle.

GiST (Generalized Search Tree)

For geometric data, ranges, and nearest-neighbor searches:

1
2
3
4
5
6
7
-- Range types
CREATE INDEX idx_bookings_period ON bookings USING gist(date_range);
-- Query: WHERE date_range && '[2024-01-01, 2024-01-31]'

-- PostGIS geography
CREATE INDEX idx_locations_geo ON locations USING gist(coordinates);
-- Query: WHERE ST_DWithin(coordinates, my_point, 1000)

BRIN (Block Range Index)

Tiny indexes for naturally ordered data:

1
CREATE INDEX idx_logs_created ON logs USING brin(created_at);

BRIN stores min/max values per block of rows. If your data is physically ordered (like append-only logs sorted by timestamp), BRIN can be 100x smaller than B-tree while still being useful.

Catch: if data isn’t physically ordered, BRIN is useless.

Partial Indexes: Index Only What Matters

Why index rows you’ll never query?

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
-- Only index active users
CREATE INDEX idx_users_email_active ON users(email) 
WHERE deleted_at IS NULL;

-- Only index recent orders
CREATE INDEX idx_orders_pending ON orders(user_id, created_at) 
WHERE status = 'pending';

-- Only index non-null values
CREATE INDEX idx_users_phone ON users(phone) 
WHERE phone IS NOT NULL;

Partial indexes are smaller, faster to update, and faster to scan. Use them when your queries always include the same filter condition.

Covering Indexes: Avoid the Table Entirely

When a query can be answered entirely from the index, the database doesn’t need to fetch the actual row. This is called an index-only scan.

1
2
3
4
5
6
7
-- Query: SELECT email, created_at FROM users WHERE email = 'x'

-- Regular index: must fetch row for created_at
CREATE INDEX idx_users_email ON users(email);

-- Covering index: includes created_at in the index
CREATE INDEX idx_users_email_covering ON users(email) INCLUDE (created_at);

The INCLUDE columns aren’t part of the search key but are stored in the index for retrieval. This can dramatically speed up queries that select a few extra columns.

Expression Indexes: Index Computed Values

When you query on transformed data, index the transformation:

1
2
3
4
5
6
7
8
-- Query: WHERE LOWER(email) = 'user@example.com'
CREATE INDEX idx_users_email_lower ON users(LOWER(email));

-- Query: WHERE DATE(created_at) = '2024-01-15'
CREATE INDEX idx_orders_date ON orders(DATE(created_at));

-- Query: WHERE data->>'type' = 'click'
CREATE INDEX idx_events_type ON events((data->>'type'));

Without the expression index, the database can’t use an index on email for a query on LOWER(email).

When Indexes Hurt

Write-Heavy Workloads

Every index slows down INSERT, UPDATE, and DELETE. A table with 10 indexes means 10 extra write operations per row change.

For bulk imports, consider:

1
2
3
4
5
6
7
8
-- Drop indexes before bulk insert
DROP INDEX idx_users_email;

-- Insert millions of rows
COPY users FROM '/data/users.csv';

-- Rebuild index (faster than incremental updates)
CREATE INDEX idx_users_email ON users(email);

Low-Selectivity Columns

An index on a boolean is_active column is usually useless. If 90% of rows are active, the database will just do a sequential scan anyway — it’s faster than bouncing between the index and the table.

Rule of thumb: if the index would return more than 10-15% of the table, it probably won’t be used.

Too Many Indexes

Each index:

  • Consumes disk space
  • Slows down writes
  • Requires VACUUM maintenance
  • Adds to backup size

Audit regularly. If an index hasn’t been used in months, consider dropping it.

1
2
3
4
5
6
-- PostgreSQL: check index usage
SELECT 
  schemaname, relname, indexrelname, 
  idx_scan, idx_tup_read
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;

The Query Planner: Trust But Verify

The database query planner decides whether to use your index. Use EXPLAIN ANALYZE to see what it actually does:

1
2
EXPLAIN ANALYZE 
SELECT * FROM users WHERE email = 'user@example.com';

Look for:

  • Index Scan or Index Only Scan — using the index ✅
  • Seq Scan — sequential scan, index ignored ❌
  • Bitmap Index Scan — hybrid approach, often fine ✅

If the planner ignores your index:

  1. Statistics might be stale: ANALYZE users;
  2. Query might not match index (wrong column order, missing expression)
  3. Table might be too small (seq scan is faster)
  4. Selectivity might be too low

Practical Index Strategy

  1. Start with no indexes (besides primary key)
  2. Monitor slow queries (pg_stat_statements, slow query log)
  3. Add indexes for specific problems (not speculatively)
  4. Verify with EXPLAIN ANALYZE
  5. Audit quarterly — remove unused indexes

The best index strategy isn’t the most indexes. It’s the right indexes for your actual query patterns, and nothing more.


Indexes are powerful, but they’re not free. Every index is a tradeoff between read speed and write speed, between query flexibility and storage cost. Use them deliberately, measure their impact, and don’t be afraid to drop them when they’re not helping.