What are indexes and why are they needed?

🟢 Junior Level

Indexes are auxiliary data structures in a database that speed up data retrieval.

Simple analogy: Imagine a thick book without a table of contents. To find a specific topic, you’d have to flip through all the pages. An index is like a table of contents at the beginning of the book: you immediately see which page the needed section is on.

Core idea:

Without an index, the database scans the entire table sequentially (Seq Scan — sequential scanning, where each row is read one after another, as if you were flipping through a book page by page)
With an index, the database quickly finds the required rows

Example:

-- Without index: slow search across entire table
SELECT * FROM users WHERE email = 'test@example.com';

-- Create index
CREATE INDEX idx_users_email ON users(email);

-- Now search is instantaneous
SELECT * FROM users WHERE email = 'test@example.com'; -- Uses index

What changed internally: The SQL query is the same, but PostgreSQL no longer reads all 1,000,000 rows. Instead, it traverses 3-4 pages of the B-tree index (~0.01ms) and immediately jumps to the required row in the table via TID.

When to use:

When you frequently search data by a specific field
When the table is large (thousands of rows and more)
For fields used in WHERE, JOIN, ORDER BY

🟡 Middle Level

How it works

An index stores column values in sorted order and pointers (TID — Tuple ID, a pair of block_number:offset_in_block) to physical rows in the table. The most common index type is B-Tree (created by default).

Index types in PostgreSQL

Type	When to use	Supported operations
B-Tree	Universal (default)	`<`, `<=`, `=`, `>=`, `>`, ranges
Hash	Exact equality only	`=`
GIN	Arrays, JSONB, full-text search	`@>`, `?`, `@@`
GiST	Geospatial data, intervals	Depends on operator
BRIN	Huge tables (>100GB) with ordered data	Ranges

Why not always B-Tree? A Hash index takes less disk space than B-Tree but only supports =. GIN can search within arrays and JSONB — which B-Tree can’t do at all. BRIN stores only value ranges per page — takes 100-1000x less space than B-Tree, but only works for ordered data (time series).

Scan types

Type	Mechanism	When it occurs
Index Scan	Index → table read	When columns not in index are needed
Index Only Scan	Index only	All data available in index
Bitmap Index Scan	Index → bitmap → table	When data is abundant but scattered
Seq Scan	Sequential read	Small tables or no index

Bitmap Index Scan: collects ALL matching TIDs from the index into a bitmap (1 = match, 0 = no match), then reads rows from heap. More efficient than Index Scan when reading 100-10,000 rows from different locations — fewer random I/O operations.

Common mistakes

Creating indexes on all fields
- ❌ Problem: Slows down INSERT/UPDATE/DELETE
- ✅ Solution: Index only fields actually used in WHERE/JOIN

Using functions in WHERE

-- ❌ Index will NOT be used
SELECT * FROM users WHERE LOWER(email) = 'test@example.com';

-- ✅ Create a functional index
CREATE INDEX idx_users_email_lower ON users(LOWER(email));

Searching by second column of a composite index

-- Index (city, age)
-- ❌ Will not be used when searching only by age
SELECT * FROM users WHERE age = 25;

When NOT to create an index

Small tables (< 1000 rows) — Seq Scan is faster because the cost of one sequential read of all pages is less than the cost of navigating the B-tree index + random heap page reads
Columns with low cardinality (e.g., gender: M/F)
Fields rarely used in queries
Tables with intensive writes (each index slows down INSERT)

Practical example

-- Composite index for a typical query
CREATE INDEX idx_orders_customer_date
ON orders(customer_id, created_at DESC);

-- This query fully uses the index
SELECT * FROM orders
WHERE customer_id = 123
ORDER BY created_at DESC
LIMIT 10;

🔴 Senior Level

Internal Implementation

An index in PostgreSQL is a disk structure optimized for concurrent access. B-tree is based on the Lehman and Yao algorithm, which allows searches and insertions without locking the entire path from root to leaf.

B-tree structure:

Meta-page (page 0)
  └── Root page
       ├── Internal pages (navigation)
       └── Leaf pages (data + TID → Heap)
            └── P_NEXT / P_PREV (scan links)

High Key: Each page contains the maximum value. If during a search we see that the search value > High Key — it means a split occurred, and we need to follow P_NEXT.

Architectural Trade-offs

Covering indexes (INCLUDE):

-- INCLUDE columns are stored only in leaf nodes, without bloating the tree
CREATE INDEX idx_orders_customer_total
ON orders(customer_id) INCLUDE (total_amount);

-- Index Only Scan without accessing Heap
SELECT customer_id, total_amount
FROM orders
WHERE customer_id = 123;

Partial indexes:

-- Index only active sessions (10-100x space savings)
CREATE INDEX idx_active_sessions
ON sessions(user_id)
WHERE status = 'ACTIVE';

Creation in Production:

-- CONCURRENTLY does not block writes, but takes longer
CREATE INDEX CONCURRENTLY idx_users_email ON users(email);
-- ⚠️ Cannot be used inside a transaction!
-- ⚠️ If interrupted, an INVALID index remains

MVCC and HOT Updates

In PostgreSQL, UPDATE = DELETE + INSERT. If a column is indexed, all indexes must be updated.

HOT (Heap Only Tuple) — an optimization that allows updating a row without updating indexes:

Works if the modified columns are not indexed
There must be free space on the Heap page

Conclusion: Each extra index kills HOT and increases Write Amplification.

Edge Cases

Index Bloat
- Cause: Page Splits when inserting into the middle
- Solution: fillfactor = 70-80 for frequently updated indexes
- Maintenance: REINDEX CONCURRENTLY (VACUUM does not return space to OS)
Selectivity and planner
- If a query returns >10-15% of the table → planner will choose Seq Scan
- Random I/O (Index Scan) is more expensive than Sequential Scan for large selections

Invalid indexes

-- Check for INVALID indexes (after interrupted CREATE INDEX CONCURRENTLY)
SELECT indexrelid::regclass, indisvalid
FROM pg_index
WHERE NOT indisvalid;

Performance

Metric	1-2 indexes	10-15 indexes
Write IOPS	Minimal	Exponential
Replica Lag	Almost absent	High risk
Update Speed	High (HOT possible)	Low
Maintenance	Fast	Long REINDEX

Production Experience

Real scenario:

Orders table: 50 million rows, 15 indexes
Problem: INSERT slowed 10x, Replica Lag = 30 seconds
Solution:
- Removed 8 unused indexes (idx_scan = 0 in pg_stat_user_indexes)
- Replaced 3 separate indexes with 1 composite
- Result: INSERT sped up 5x, Replica Lag < 1 second

Monitoring

-- Index sizes
SELECT
    indexrelname,
    pg_size_pretty(pg_relation_size(indexrelid)) as size,
    idx_scan as scans,
    idx_tup_read,
    idx_tup_fetch
FROM pg_stat_user_indexes
WHERE relname = 'orders'
ORDER BY pg_relation_size(indexrelid) DESC;

-- Unused indexes (candidates for deletion)
SELECT
    schemaname,
    relname as table_name,
    indexrelname as index_name,
    pg_size_pretty(pg_relation_size(indexrelid)) as size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND NOT indisunique  -- don't delete UNIQUE
ORDER BY pg_relation_size(indexrelid) DESC;

-- Check Bloat via pgstattuple
SELECT * FROM pgstattuple('idx_orders_customer');
-- avg_leaf_density < 50% → time for REINDEX

Best Practices

Always use CONCURRENTLY in production
Monitor pg_stat_user_indexes for unused indexes
Check plans via EXPLAIN (ANALYZE, BUFFERS)
Consolidate indexes: (a), (b), (a,b,c) → one (a,b,c)
Use HypoPG for testing hypothetical indexes
Foreign Keys — almost always index them

Summary for Senior

Indexes are a write tax in exchange for read speedup
Balance between SELECT speed and INSERT/UPDATE cost
Monitor Bloat, Replica Lag, HOT-update ratio
EXPLAIN (ANALYZE, BUFFERS) is your best friend
Regularly remove unused indexes

🎯 Interview Cheat Sheet

Must know:

Indexes speed up SELECT, slow down INSERT/UPDATE/DELETE
B-Tree is the default type, supports =, <, >, ranges
5 types: B-Tree, Hash, GIN, GiST, BRIN
4 scan types: Index Scan, Index Only Scan, Bitmap, Seq Scan
CONCURRENTLY for creation without blocking in production
INCLUDE for Index Only Scan without bloating the tree

Frequent follow-up questions:

“What happens if you create indexes on all fields?” → Write Amplification, Replica Lag, kills HOT
“How to check if an index is used?” → EXPLAIN (ANALYZE, BUFFERS), pg_stat_user_indexes
“Why doesn’t WHERE LOWER(email) = ... use an index?” → Need a functional index
“How to find unused indexes?” → idx_scan = 0 in pg_stat_user_indexes
“What’s the difference between VACUUM and REINDEX?” → VACUUM cleans dead rows, REINDEX rebuilds the index

Red flags (DON’T say):

❌ “Indexes are always beneficial” (no — it’s a write tax)
❌ “VACUUM shrinks index files” (no, need REINDEX)
❌ “Hash index is better than B-Tree” (Hash only for =)
❌ “I’ll create indexes just in case” (analyze the workload)

Related topics:

[[How does B-tree index work]] → internal structure
[[What is a composite index]] → leftmost prefix rule
[[How does MVCC work in PostgreSQL]] → MVCC and HOT updates
[[What is VACUUM in PostgreSQL]] → index maintenance
[[How to optimize slow queries]] → practical application