What are the disadvantages of indexes?

🟢 Junior Level

Indexes have a downside: they speed up reads but slow down writes.

Simple analogy: Imagine a book with a table of contents. When you add a new chapter, you need to not only write it into the book but also update the table of contents. If there are multiple tables of contents — you need to update each one.

Main disadvantages:

⚠️ Slower INSERT/UPDATE/DELETE — each index needs updating
⚠️ Take up disk space — sometimes more than the table itself
⚠️ Require maintenance — you need to monitor their state

Example:

-- Table with 5 indexes
-- When inserting 1 row:
-- 1. Write the row to the table
-- 2. Update index #1
-- 3. Update index #2
-- 4. Update index #3
-- 5. Update index #4
-- 6. Update index #5
-- Total: 6 operations instead of 1!

Rule: Create indexes only when they are truly needed.

🟡 Middle Level

Write Amplification

Each index increases the number of write operations:

1 INSERT into the table
= 1 write to the table
+ N writes to indexes
+ Write to WAL (Write Ahead Log)

Impact on performance:

Number of indexes	INSERT latency	WAL volume
1	1 ms	1x
5	5-10 ms	5x
10	20-50 ms	10x

Blocking HOT updates

HOT (Heap Only Tuple) — a PostgreSQL optimization that allows updating a row without updating indexes.

Heap — the main table storage where the actual data rows live. The index contains only keys + references (TID) to the Heap. With HOT, a new version of the row is written to the same Heap page with the same TID — indexes remain valid, no need to touch them.

-- Table with an index on status
CREATE INDEX idx_orders_status ON orders(status);

-- ❌ This UPDATE must update the index
UPDATE orders SET status = 'SHIPPED' WHERE id = 123;

-- ✅ This UPDATE can use HOT (status is not indexed)
UPDATE orders SET note = 'New note' WHERE id = 123;

Conditions for HOT:

Modified columns are not indexed
The Heap page has free space

Bloat (Index Inflation)

Indexes are prone to fragmentation more than tables:

-- Problem: Page Splits when inserting into the middle
-- Index on email: 'A...', 'B...', 'C...'
-- Insert 'B...' → page overflows → Split

-- Solution: fillfactor
CREATE INDEX idx_orders_status ON orders(status)
WITH (fillfactor = 70);
-- 30% of page is free → fewer splits

VACUUM does NOT return space to the OS:

VACUUM marks space as free for new writes
The physical file size does not shrink
REINDEX is needed for compaction

RAM contention

-- Check index sizes
SELECT
    indexrelname,
    pg_size_pretty(pg_relation_size(indexrelid)) as size
FROM pg_stat_user_indexes
WHERE relname = 'orders'
ORDER BY pg_relation_size(indexrelid) DESC;

-- If total index size > available RAM
-- → Cache Misses → Random I/O → performance drop

Query planning time

The optimizer considers all available indexes:

5-10 indexes → planning < 1 ms
20-30 indexes → planning 10-50 ms
100+ indexes → planning > 100 ms!

When NOT to create an index

Table < 10,000 rows (Seq Scan is faster)
Column with low cardinality (e.g., gender: M/F)
Field rarely used in WHERE/JOIN
Table with very intensive writes

🔴 Senior Level

Write Amplification and WAL logs

Detailed I/O impact:

INSERT operation without indexes:
├── Heap page writes: 1
├── WAL writes: 1 page
└── I/O operations: 1-2

INSERT operation with 10 indexes:
├── Heap page writes: 1
├── Index page writes: 10
├── WAL writes: 11 pages
├── Possible Page Splits: +10-20 writes
└── I/O operations: 15-30

Impact on Replica Lag:

Each index generates WAL entries
The replica must replay ALL entries
With 10+ indexes: WAL volume increases 10-15x
Replica can’t keep up → Replica Lag grows
Critical for systems with Read Replicas

Blocking HOT updates: Deep analysis

HOT mechanism:

-- Heap page (fillfactor = 100, no space)
┌─────────────────────────────────┐
│ Row1 | Row2 | Row3 | ... | Full│
└─────────────────────────────────┘

-- UPDATE Row2 without HOT:
-- 1. Create a new version of Row2 on another page
-- 2. Update ALL indexes (new TID)
-- 3. Mark the old version as dead

-- UPDATE Row2 with HOT (space available on the page):
-- 1. Create a new version of Row2 nearby
-- 2. Update only Heap (don't touch indexes!)
-- 3. Mark the old version as dead
-- 4. Indexes are still valid (TID hasn't changed)

Monitoring HOT:

-- Check HOT efficiency
SELECT
    schemaname,
    relname,
    n_tup_hot_update,  -- HOT updates
    n_tup_upd,         -- Total updates
    ROUND(100.0 * n_tup_hot_update / NULLIF(n_tup_upd, 0), 2) as hot_ratio
FROM pg_stat_user_tables
WHERE n_tup_upd > 0
ORDER BY hot_ratio DESC;

-- hot_ratio > 70% → excellent (typical value for a healthy OLTP workload)
-- hot_ratio < 20% → problem: most updates are forced to update indexes,
--   indicating over-indexing or too small a fillfactor

Index Bloat: Diagnostics and treatment

Causes of Bloat:

Page Splits — inserting into the middle of a sorted list
Deletions — pages remain half-empty
Updates of indexed columns — new versions, old ones dead

Diagnostics:

-- 1. Check via pgstattuple
CREATE EXTENSION pgstattuple;

SELECT
    index_name,
    avg_leaf_density,      -- Normal: 70-90%
    leaf_fragmentation     -- Normal: < 20%
FROM pgstatindex('idx_orders_email');

-- 2. Compare size with expected
SELECT
    schemaname,
    relname as table_name,
    indexrelname as index_name,
    pg_size_pretty(pg_relation_size(indexrelid)) as actual_size,
    idx_scan,
    -- If size is suspiciously large → Bloat
    CASE
        WHEN pg_relation_size(indexrelid) > 1000000000  -- > 1 GB
             AND idx_scan < 100
        THEN 'POSSIBLE BLOAT'
        ELSE 'OK'
    END as status
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY pg_relation_size(indexrelid) DESC;

Treatment:

-- 1. REINDEX (blocks writes!)
REINDEX INDEX idx_orders_email;

-- 2. REINDEX CONCURRENTLY (no blocking, PG 12+)
REINDEX INDEX CONCURRENTLY idx_orders_email;

-- 3. pg_repack (no locking, extension)
-- Install: CREATE EXTENSION pg_repack;
-- Run: pg_repack -t orders -d mydb

Shared Buffers contention

Math:

Given:
- RAM: 64 GB
- shared_buffers: 16 GB (25%)
- Table: 10 GB
- Indexes: 20 GB

Problem:
- Table + Indexes = 30 GB > 16 GB shared_buffers
- Part of indexes is evicted to disk
- Cache Miss rate > 40%
- Random I/O instead of reading from RAM

Solution:
- Drop unused indexes → 8 GB
- Increase shared_buffers to 32 GB
- Cache Hit rate > 95%

Cache Hit monitoring:

-- Check cache efficiency
SELECT
    sum(heap_blks_read) as heap_read,
    sum(heap_blks_hit) as heap_hit,
    ROUND(100.0 * sum(heap_blks_hit) /
          NULLIF(sum(heap_blks_read) + sum(heap_blks_hit), 0), 2)
          as cache_hit_pct
FROM pg_statio_user_tables;

-- > 99% → excellent
-- 95-99% → normal
-- < 95% → problem (not enough RAM or too many indexes)

Planning Time Overhead

-- Enable planning time logging
SET log_min_duration_statement = 0;
SET log_parser_stats = on;

-- In logs you'll see:
-- duration: 0.123 ms  parse
-- duration: 0.456 ms  plan
-- duration: 12.345 ms execute

-- If plan > execute → too many indexes/tables

geqo_threshold: When > 12 tables in a query, the genetic algorithm (GEQO) kicks in, which may choose a suboptimal plan.

Production Experience

Real scenario #1: Bloat kills performance

SaaS platform: 100M users
Index on email: 8 GB (should be 2 GB)
Symptom: Email search > 500ms, Cache Hit rate 60%
Cause: 2 years of updates without REINDEX → 75% Bloat
Solution: REINDEX INDEX CONCURRENTLY idx_users_email
Result: Index became 2.1 GB, search < 5ms, Cache Hit 98%

Real scenario #2: Replica Lag due to indexes

E-commerce: 15 indexes on orders (200M rows)
Problem: Replica Lag > 120 seconds
Analysis: WAL volume 1 GB/minute, replica can’t keep up
Solution:
- Dropped 10 unused indexes
- WAL volume dropped to 200 MB/minute
- Replica Lag < 5 seconds

Monitoring Dashboard

-- 1. Top indexes by size
SELECT
    indexrelname,
    pg_size_pretty(pg_relation_size(indexrelid)) as size,
    idx_scan
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC
LIMIT 10;

-- 2. Unused indexes (candidates for removal)
SELECT
    relname as table_name,
    indexrelname as index_name,
    pg_size_pretty(pg_relation_size(indexrelid)) as size,
    idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND NOT indisunique  -- not UNIQUE!
ORDER BY pg_relation_size(indexrelid) DESC;

-- 3. HOT ratio per table
SELECT
    relname,
    n_tup_hot_update,
    n_tup_upd,
    ROUND(100.0 * n_tup_hot_update / NULLIF(n_tup_upd, 0), 2) as hot_pct
FROM pg_stat_user_tables
WHERE n_tup_upd > 1000
ORDER BY hot_pct ASC
LIMIT 10;

-- 4. Fill factor check
SELECT
    c.relname as index_name,
    c.reloptions
FROM pg_class c
JOIN pg_index i ON i.indexrelid = c.oid
WHERE c.reloptions::text LIKE '%fillfactor%'
  AND c.relkind = 'i';

Best Practices

Monitor HOT ratio — target > 70%
Check Bloat regularly via pgstattuple
REINDEX CONCURRENTLY quarterly for hot indexes
Drop unused — idx_scan = 0 for a month → candidate for removal
fillfactor = 70-80 for frequently updated indexes
Monitor Replica Lag after adding indexes
Limit the count — maximum 5-7 indexes per table
Cache Hit Rate > 95% — if lower, reduce indexes or add RAM

Summary for Senior

Indexes are a write tax (Write Amplification, WAL, Replica Lag)
HOT-update ratio < 20% → too many indexes
Bloat is inevitable — treat with REINDEX CONCURRENTLY
Shared Buffers contention — indexes evict data from cache
Planning Time grows with the number of indexes
Monitor: pgstattuple, pg_stat_user_indexes, HOT ratio, Cache Hit rate
Ruthlessly remove — if an index hasn’t been used for a month, it’s not needed

🎯 Interview Cheat Sheet

Must know:

Write Amplification: 1 INSERT = 1 table write + N index writes + WAL
HOT (Heap Only Tuple): UPDATE optimization without updating indexes (if the column is not indexed)
HOT ratio > 70% → excellent, < 20% → too many indexes
Index Bloat: Page Splits → fragmentation → REINDEX CONCURRENTLY
VACUUM does NOT shrink the file → REINDEX for compaction
Shared Buffers contention: indexes evict data → Cache Miss → Random I/O
Planning Time: 5-10 indexes < 1ms, 20-30 → 10-50ms, 100+ → > 100ms

Common follow-up questions:

“Why are 15 indexes bad?” → Replica Lag, Write Amplification, Cache Miss
“How to check index Bloat?” → pgstattuple, avg_leaf_density < 50% → REINDEX
“What is HOT and why is it important?” → UPDATE without updating indexes → faster
“Why does Cache Hit Rate drop as indexes grow?” → Indexes evict data from RAM

Red flags (DO NOT say):

❌ “More indexes are better” (Write Amplification!)
❌ “VACUUM shrinks the index file” (no, REINDEX does)
❌ “HOT is some kind of magic” (UPDATE without updating indexes!)
❌ “Planning Time doesn’t matter” (100+ indexes → > 100ms planning)

Related topics:

[[What are indexes for]] → when to create
[[How does B-tree index work]] → Bloat, Page Splits
[[What is a composite index]] → index consolidation
[[How to optimize slow queries]] → index balance