What are indexes and why are they needed?
An index stores column values in sorted order and pointers (TID — Tuple ID, a pair of block_number:offset_in_block) to physical rows in the table. The most common index type is...
🟢 Junior Level
Indexes are auxiliary data structures in a database that speed up data retrieval.
Simple analogy: Imagine a thick book without a table of contents. To find a specific topic, you’d have to flip through all the pages. An index is like a table of contents at the beginning of the book: you immediately see which page the needed section is on.
Core idea:
- Without an index, the database scans the entire table sequentially (Seq Scan — sequential scanning, where each row is read one after another, as if you were flipping through a book page by page)
- With an index, the database quickly finds the required rows
Example:
-- Without index: slow search across entire table
SELECT * FROM users WHERE email = 'test@example.com';
-- Create index
CREATE INDEX idx_users_email ON users(email);
-- Now search is instantaneous
SELECT * FROM users WHERE email = 'test@example.com'; -- Uses index
What changed internally: The SQL query is the same, but PostgreSQL no longer reads all 1,000,000 rows. Instead, it traverses 3-4 pages of the B-tree index (~0.01ms) and immediately jumps to the required row in the table via TID.
When to use:
- When you frequently search data by a specific field
- When the table is large (thousands of rows and more)
- For fields used in WHERE, JOIN, ORDER BY
🟡 Middle Level
How it works
An index stores column values in sorted order and pointers (TID — Tuple ID, a pair of block_number:offset_in_block) to physical rows in the table. The most common index type is B-Tree (created by default).
Index types in PostgreSQL
| Type | When to use | Supported operations |
|---|---|---|
| B-Tree | Universal (default) | <, <=, =, >=, >, ranges |
| Hash | Exact equality only | = |
| GIN | Arrays, JSONB, full-text search | @>, ?, @@ |
| GiST | Geospatial data, intervals | Depends on operator |
| BRIN | Huge tables (>100GB) with ordered data | Ranges |
Why not always B-Tree? A Hash index takes less disk space than B-Tree but only supports =. GIN can search within arrays and JSONB — which B-Tree can’t do at all. BRIN stores only value ranges per page — takes 100-1000x less space than B-Tree, but only works for ordered data (time series).
Scan types
| Type | Mechanism | When it occurs |
|---|---|---|
| Index Scan | Index → table read | When columns not in index are needed |
| Index Only Scan | Index only | All data available in index |
| Bitmap Index Scan | Index → bitmap → table | When data is abundant but scattered |
| Seq Scan | Sequential read | Small tables or no index |
Bitmap Index Scan: collects ALL matching TIDs from the index into a bitmap (1 = match, 0 = no match), then reads rows from heap. More efficient than Index Scan when reading 100-10,000 rows from different locations — fewer random I/O operations.
Common mistakes
- Creating indexes on all fields
- ❌ Problem: Slows down INSERT/UPDATE/DELETE
- ✅ Solution: Index only fields actually used in WHERE/JOIN
- Using functions in WHERE
-- ❌ Index will NOT be used SELECT * FROM users WHERE LOWER(email) = 'test@example.com'; -- ✅ Create a functional index CREATE INDEX idx_users_email_lower ON users(LOWER(email)); - Searching by second column of a composite index
-- Index (city, age) -- ❌ Will not be used when searching only by age SELECT * FROM users WHERE age = 25;
When NOT to create an index
- Small tables (< 1000 rows) — Seq Scan is faster because the cost of one sequential read of all pages is less than the cost of navigating the B-tree index + random heap page reads
- Columns with low cardinality (e.g., gender: M/F)
- Fields rarely used in queries
- Tables with intensive writes (each index slows down INSERT)
Practical example
-- Composite index for a typical query
CREATE INDEX idx_orders_customer_date
ON orders(customer_id, created_at DESC);
-- This query fully uses the index
SELECT * FROM orders
WHERE customer_id = 123
ORDER BY created_at DESC
LIMIT 10;
🔴 Senior Level
Internal Implementation
An index in PostgreSQL is a disk structure optimized for concurrent access. B-tree is based on the Lehman and Yao algorithm, which allows searches and insertions without locking the entire path from root to leaf.
B-tree structure:
Meta-page (page 0)
└── Root page
├── Internal pages (navigation)
└── Leaf pages (data + TID → Heap)
└── P_NEXT / P_PREV (scan links)
High Key: Each page contains the maximum value. If during a search we see that the search value > High Key — it means a split occurred, and we need to follow P_NEXT.
Architectural Trade-offs
Covering indexes (INCLUDE):
-- INCLUDE columns are stored only in leaf nodes, without bloating the tree
CREATE INDEX idx_orders_customer_total
ON orders(customer_id) INCLUDE (total_amount);
-- Index Only Scan without accessing Heap
SELECT customer_id, total_amount
FROM orders
WHERE customer_id = 123;
Partial indexes:
-- Index only active sessions (10-100x space savings)
CREATE INDEX idx_active_sessions
ON sessions(user_id)
WHERE status = 'ACTIVE';
Creation in Production:
-- CONCURRENTLY does not block writes, but takes longer
CREATE INDEX CONCURRENTLY idx_users_email ON users(email);
-- ⚠️ Cannot be used inside a transaction!
-- ⚠️ If interrupted, an INVALID index remains
MVCC and HOT Updates
In PostgreSQL, UPDATE = DELETE + INSERT. If a column is indexed, all indexes must be updated.
HOT (Heap Only Tuple) — an optimization that allows updating a row without updating indexes:
- Works if the modified columns are not indexed
- There must be free space on the Heap page
Conclusion: Each extra index kills HOT and increases Write Amplification.
Edge Cases
- Index Bloat
- Cause: Page Splits when inserting into the middle
- Solution:
fillfactor = 70-80for frequently updated indexes - Maintenance:
REINDEX CONCURRENTLY(VACUUM does not return space to OS)
- Selectivity and planner
- If a query returns >10-15% of the table → planner will choose Seq Scan
- Random I/O (Index Scan) is more expensive than Sequential Scan for large selections
- Invalid indexes
-- Check for INVALID indexes (after interrupted CREATE INDEX CONCURRENTLY) SELECT indexrelid::regclass, indisvalid FROM pg_index WHERE NOT indisvalid;
Performance
| Metric | 1-2 indexes | 10-15 indexes |
|---|---|---|
| Write IOPS | Minimal | Exponential |
| Replica Lag | Almost absent | High risk |
| Update Speed | High (HOT possible) | Low |
| Maintenance | Fast | Long REINDEX |
Production Experience
Real scenario:
- Orders table: 50 million rows, 15 indexes
- Problem: INSERT slowed 10x, Replica Lag = 30 seconds
- Solution:
- Removed 8 unused indexes (
idx_scan = 0inpg_stat_user_indexes) - Replaced 3 separate indexes with 1 composite
- Result: INSERT sped up 5x, Replica Lag < 1 second
- Removed 8 unused indexes (
Monitoring
-- Index sizes
SELECT
indexrelname,
pg_size_pretty(pg_relation_size(indexrelid)) as size,
idx_scan as scans,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
WHERE relname = 'orders'
ORDER BY pg_relation_size(indexrelid) DESC;
-- Unused indexes (candidates for deletion)
SELECT
schemaname,
relname as table_name,
indexrelname as index_name,
pg_size_pretty(pg_relation_size(indexrelid)) as size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND NOT indisunique -- don't delete UNIQUE
ORDER BY pg_relation_size(indexrelid) DESC;
-- Check Bloat via pgstattuple
SELECT * FROM pgstattuple('idx_orders_customer');
-- avg_leaf_density < 50% → time for REINDEX
Best Practices
- Always use
CONCURRENTLYin production - Monitor
pg_stat_user_indexesfor unused indexes - Check plans via
EXPLAIN (ANALYZE, BUFFERS) - Consolidate indexes:
(a),(b),(a,b,c)→ one(a,b,c) - Use
HypoPGfor testing hypothetical indexes - Foreign Keys — almost always index them
Summary for Senior
- Indexes are a write tax in exchange for read speedup
- Balance between SELECT speed and INSERT/UPDATE cost
- Monitor Bloat, Replica Lag, HOT-update ratio
EXPLAIN (ANALYZE, BUFFERS)is your best friend- Regularly remove unused indexes
🎯 Interview Cheat Sheet
Must know:
- Indexes speed up SELECT, slow down INSERT/UPDATE/DELETE
- B-Tree is the default type, supports
=,<,>, ranges - 5 types: B-Tree, Hash, GIN, GiST, BRIN
- 4 scan types: Index Scan, Index Only Scan, Bitmap, Seq Scan
CONCURRENTLYfor creation without blocking in productionINCLUDEfor Index Only Scan without bloating the tree
Frequent follow-up questions:
- “What happens if you create indexes on all fields?” → Write Amplification, Replica Lag, kills HOT
- “How to check if an index is used?” →
EXPLAIN (ANALYZE, BUFFERS),pg_stat_user_indexes - “Why doesn’t
WHERE LOWER(email) = ...use an index?” → Need a functional index - “How to find unused indexes?” →
idx_scan = 0inpg_stat_user_indexes - “What’s the difference between VACUUM and REINDEX?” → VACUUM cleans dead rows, REINDEX rebuilds the index
Red flags (DON’T say):
- ❌ “Indexes are always beneficial” (no — it’s a write tax)
- ❌ “VACUUM shrinks index files” (no, need REINDEX)
- ❌ “Hash index is better than B-Tree” (Hash only for
=) - ❌ “I’ll create indexes just in case” (analyze the workload)
Related topics:
- [[How does B-tree index work]] → internal structure
- [[What is a composite index]] → leftmost prefix rule
- [[How does MVCC work in PostgreSQL]] → MVCC and HOT updates
- [[What is VACUUM in PostgreSQL]] → index maintenance
- [[How to optimize slow queries]] → practical application