What is a composite index?
A composite index sorts data hierarchically:
π’ Junior Level
Composite (compound) index is an index created on multiple columns simultaneously.
Why: a single composite index can replace several individual ones, saving disk space and speeding up writes. Additionally, it allows optimizing queries that filter on multiple fields at once β which is common in most real-world applications.
Simple analogy: Imagine a phone book where contacts are sorted first by last name, then by first name. You first find the right last name, and within it β the right first name.
Main idea:
- One index can speed up searches on multiple fields
- The order of columns in the index is critically important
- The left prefix rule applies
Example:
-- Creating a composite index
CREATE INDEX idx_users_city_age ON users(city, age);
-- β
Will be used (by city)
SELECT * FROM users WHERE city = 'Moscow';
-- β
Will be used (by city and age)
SELECT * FROM users WHERE city = 'Moscow' AND age = 25;
-- β Will NOT be used (only age)
SELECT * FROM users WHERE age = 25;
When to use:
- When you frequently search by a combination of fields
- When queries use different combinations of the same fields
- For optimizing sorting on multiple fields
π‘ Middle Level
How it works
A composite index sorts data hierarchically:
- First by the first column
- Within equal values of the first β by the second
- And so on
Left prefix rule: An index (A, B, C) can be used for searches by:
- β
A - β
A, B - β
A, B, C - β
B(without A) - β
C(without A and B) - β
B, C(without A)
Golden rules of column ordering
1. Equality before range (= before >)
-- Query: WHERE city = 'Moscow' AND age > 25
-- β
Correct order
CREATE INDEX idx_city_age ON users(city, age);
-- β Incorrect order (age as range makes city useless)
CREATE INDEX idx_age_city ON users(age, city);
2. Sorting (ORDER BY)
-- The index covers ORDER BY in the same order
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at DESC);
-- β
Uses the index
SELECT * FROM orders WHERE user_id = 123 ORDER BY created_at DESC;
-- β Does not use the index for sorting
SELECT * FROM orders ORDER BY created_at DESC; -- no filter on user_id
Common mistakes
- Wrong column order
-- β If city is used with '=' and age with '>', city must come first CREATE INDEX idx_age_city ON users(age, city); -- Incorrect - Duplicate indexes
-- β Index (A, B) already covers (A) CREATE INDEX idx_a ON users(a); -- Redundant CREATE INDEX idx_a_b ON users(a, b); -- Covers both cases - Too many columns
- PostgreSQL supports up to 32 columns in an index
- In practice: more than 5 columns β a red flag
- Large index β needs more RAM to stay in cache β if RAM is insufficient β Cache Misses β Random I/O β performance drop
Comparison with separate indexes
| Query | Index (A) + (B) | Index (A, B) |
|---|---|---|
WHERE A = 1 |
β Uses (A) | β Uses (A,B) |
WHERE B = 2 |
β Uses (B) | β Does not use |
WHERE A = 1 AND B = 2 |
β οΈ Bitmap or one of them | β Fully uses |
Practical example
-- Typical e-commerce scenario
CREATE INDEX idx_orders_customer_date ON orders(customer_id, created_at DESC);
-- Query 1: All orders of a customer
SELECT * FROM orders WHERE customer_id = 123;
-- Plan: Index Scan (by customer_id)
-- Query 2: Recent orders of a customer
SELECT * FROM orders
WHERE customer_id = 123
ORDER BY created_at DESC
LIMIT 10;
-- Plan: Index Scan + Limit (very fast!)
π΄ Senior Level
Internal Implementation
Lexicographic sorting: A composite index key is essentially one large string. The B-tree stores a concatenation of column values.
Key structure in Leaf Node:
βββββββββββββββββββββββββββββββββββββββββββ
β City="Moscow" | Age=25 | TID=(123,456) β
βββββββββββββββββββββββββββββββββββββββββββ€
β City="Moscow" | Age=26 | TID=(123,457) β
βββββββββββββββββββββββββββββββββββββββββββ€
β City="SPb" | Age=20 | TID=(124,100) β
βββββββββββββββββββββββββββββββββββββββββββ
Prefix compression: PostgreSQL can compress keys in Internal nodes, storing only the part needed for navigation.
Architectural Trade-offs
Column ordering rule:
[Equi-columns] β [Range-column] β [Include-columns]
(WHERE =) (WHERE >,<) (SELECT only)
Example:
-- Query: WHERE status = 'ACTIVE' AND city = 'Moscow' AND created_at > '2024-01-01'
-- ORDER BY created_at DESC
-- β
Optimal index
CREATE INDEX idx_orders_optimal
ON orders(status, city, created_at DESC);
-- status (=) β city (=) β created_at (>) β ideal order
Mixed directions (PG 13+):
-- For ORDER BY a ASC, b DESC
CREATE INDEX idx_mixed ON table_name(a ASC, b DESC);
-- A regular index (ASC, ASC) will NOT cover such an ORDER BY
On PG 12 and below, mixed directions (ASC + DESC) in a single index are not optimized correctly β the index will be sorted in one direction. Solution: a separate index for the specific ORDER BY or additional sorting in the query.
Emulating Index Skip Scan
PostgreSQL does not have built-in Skip Scan (unlike Oracle/MySQL 8.0). But you can emulate it:
-- Problem: index (status, user_id), but searching only by user_id
-- status has few unique values (ACTIVE, INACTIVE, DELETED)
-- β
Recursive CTE emulating Skip Scan
WITH RECURSIVE skip_scan AS (
(SELECT MIN(status) AS status FROM orders)
UNION ALL
SELECT (SELECT MIN(status) FROM orders WHERE status > s.status)
FROM skip_scan s WHERE s.status IS NOT NULL
)
SELECT * FROM orders o
JOIN skip_scan s ON o.status = s.status
WHERE o.user_id = 123;
Performance: On a table with 100M records and 3 unique status values:
- Without Skip Scan: Seq Scan ~30 seconds
- With emulation: ~50 milliseconds (600x difference!)
Edge Cases
- NULL in a composite index
-- NULL sorts last by default (NULLS LAST) CREATE INDEX idx_a_b ON t(a, b); -- WHERE a IS NULL β search at the end of the index -- WHERE a = 1 AND b IS NULL β search at the end of the a=1 group - Operator Class (OpClass)
-- OpClass (Operator Class) β a set of rules defining how column values -- are compared and sorted in the index. By default, text uses -- standard character-by-character comparison, but you can use, e.g., -- hash comparison or case-insensitive comparison CREATE INDEX idx_name ON users(name COLLATE "C"); -- For specific operations CREATE INDEX idx_tags ON posts USING gin(tags); - Index-Only Scan and Visibility Map
- Even if all columns are in the index, PG may still go to Heap
- Reason: Pages are not marked in the Visibility Map
- Solution:
VACUUMupdates the Visibility Map
Performance
Impact on writes:
- Each column in the index increases key size
- Large key β fewer entries per page β more tree levels
- For 100M records:
- Index (int): ~2 GB
- Index (int, text(50)): ~8 GB
- Index (int, text(50), text(100)): ~15 GB
Index consolidation:
-- β Before (3 indexes, 15 GB total)
CREATE INDEX idx_a ON t(a); -- 2 GB
CREATE INDEX idx_a_b ON t(a, b); -- 5 GB
CREATE INDEX idx_a_b_c ON t(a, b, c); -- 8 GB
-- β
After (1 index, 8 GB)
CREATE INDEX idx_a_b_c ON t(a, b, c); -- Covers all prefixes!
Production Experience
Real scenario:
- SaaS platform: 200M order records
- Problem: 12 separate indexes, 80 GB total
- Symptom: INSERT > 500ms, RAM insufficient, Cache Hit rate < 60%
- Analysis via
pg_stat_user_indexes:- 8 indexes have
idx_scan < 10per month - 4 indexes duplicate prefixes
- 8 indexes have
- Solution:
- Consolidate to 3 composite indexes
- Added
INCLUDEfor Index-Only Scan on hot queries - Result: 15 GB of indexes, INSERT < 50ms, Cache Hit > 95%
Monitoring
-- Check composite index usage
SELECT
indexrelname,
idx_scan,
idx_tup_read,
pg_size_pretty(pg_relation_size(indexrelid)) as size
FROM pg_stat_user_indexes
WHERE relname = 'orders'
ORDER BY idx_scan DESC;
-- Check column order
SELECT
i.relname as table_name,
ix.indexrelid::regclass as index_name,
array_agg(a.attname ORDER BY k.n) as columns
FROM pg_index ix
JOIN pg_class i ON i.oid = ix.indrelid
JOIN pg_class idx ON idx.oid = ix.indexrelid
JOIN generate_series(0, array_length(ix.indkey, 1)-1) as k(n) ON true
JOIN pg_attribute a ON a.attrelid = i.oid AND a.attnum = ix.indkey[k.n]
WHERE i.relname = 'orders'
GROUP BY i.relname, ix.indexrelid;
-- Check prefix duplicates
-- (requires pg_stat_statements extension or a custom pg_index query)
Best Practices
- Column order:
=β ranges βINCLUDE - Consolidation: One index
(a,b,c)replaces(a)and(a,b) - Limit size: Maximum 3-5 columns per index
- INCLUDE: For columns only in SELECT, use
INCLUDE - Mixed directions: Create indexes for specific ORDER BY clauses
- Monitoring: Regularly check
pg_stat_user_indexesfor unused indexes
Summary for Senior
- A composite index is a lexicographically sorted string-key
- Column order is critical: Equi β Range β Include
- Emulate Skip Scan via Recursive CTE for low-cardinality first columns
- Consolidate indexes to save RAM and speed up writes
- Visibility Map affects Index-Only Scan β keep up with VACUUM
- Always check the plan via
EXPLAIN (ANALYZE, BUFFERS)
π― Interview Cheat Sheet
Must know:
- A composite index sorts data hierarchically: 1st column β 2nd β β¦
- Left prefix rule:
(A, B, C)works forA,A,B,A,B,Cbut NOT forB,C - Column order:
=(equality) β ranges (>,<) βINCLUDE(SELECT only) - Consolidation:
(a),(a,b),(a,b,c)β one(a,b,c)saves space - PostgreSQL does not have Skip Scan β emulate via Recursive CTE
- Maximum 3-5 columns in practice (PG supports up to 32)
- Mixed directions:
(a ASC, b DESC)β PG 13+
Common follow-up questions:
- βWhy doesnβt
WHERE age = 25use the index(city, age)?β β Left prefix: city is first - βHow to replace 3 separate indexes with one?β β A composite
(a,b,c)covers all prefixes - βWhat is Skip Scan and why doesnβt PG have it?β β Skipping the first column; PG doesnβt have it, we emulate with CTE
- βHow does ORDER BY affect column order?β β The index must match ORDER BY
Red flags (DO NOT say):
- β βColumn order doesnβt matterβ (itβs critically important!)
- β βIβll create a separate index on each fieldβ (consolidate!)
- β β32 columns in an index is normal practiceβ (max 3-5 in practice)
Related topics:
- [[What are indexes for]] β when to create indexes
- [[How does B-tree index work]] β internal structure
- [[What is index cardinality]] β column order by cardinality
- [[What are the disadvantages of indexes]] β write cost