What is Explain plan?
4. Hash Left Join → joined 5. HashAggregate → grouped
🟢 Junior Level
EXPLAIN — a PostgreSQL command that shows how the database plans to execute a query, without actually running it.
Simple analogy: Before a trip, GPS shows the route: which roads, how many turns, estimated time. EXPLAIN is a GPS for a query: shows which indexes, which tables, how long it will take.
Example:
-- Show query plan
EXPLAIN SELECT * FROM users WHERE email = 'test@example.com';
-- Result:
-- Index Scan using idx_users_email on users (cost=0.42..8.44 rows=1 width=100)
-- Index Cond: (email = 'test@example.com'::text)
What the numbers mean:
cost=0.42..8.44— cost: startup..total (to get all rows)rows=1— how many rows the database expectswidth=100— average row size in bytes
When to use:
- Query runs slowly
- You want to check if an index is being used
- Before creating an index — will it help?
🟡 Middle Level
EXPLAIN vs EXPLAIN ANALYZE
-- EXPLAIN: plan only, does NOT execute the query
EXPLAIN SELECT * FROM users WHERE email = 'test@example.com';
-- EXPLAIN ANALYZE: executes the query + shows actual time
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'test@example.com';
-- Result:
-- Index Scan using idx_users_email on users (cost=0.42..8.44 rows=1 width=100)
-- (actual time=0.035..0.037 rows=1 loops=1)
-- ↑ actual time in ms
-- Planning Time: 0.123 ms
-- Execution Time: 0.058 ms
Key flags
-- Full set for analysis
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT ...;
-- Flags:
-- ANALYZE → execute the query + actual time
-- BUFFERS → show cache/disk reads
-- COSTS → show cost (on by default)
-- ROWS → show estimated row count
-- WIDTH → show row size
-- FORMAT → TEXT (default), JSON, YAML, XML
Reading a plan
EXPLAIN (ANALYZE, BUFFERS)
SELECT u.name, COUNT(o.id) as orders
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.city = 'Moscow'
GROUP BY u.name;
-- Plan:
-- HashAggregate (cost=... rows=... actual time=15.2..15.3 rows=500 loops=1)
-- Buffers: shared hit=250 read=50
-- → Hash Left Join (cost=... actual time=10.1..12.5 rows=2000 loops=1)
-- Buffers: shared hit=200 read=40
-- Hash Cond: (o.user_id = u.id)
-- → Seq Scan on orders o (actual time=0.01..5.2 rows=10000 loops=1)
-- Buffers: shared hit=150
-- → Hash (cost=... actual time=3.1..3.1 rows=500 loops=1)
-- Buffers: shared hit=50 read=40
-- → Seq Scan on users u (actual time=0.02..2.8 rows=500 loops=1)
-- Filter: (city = 'Moscow')
-- Rows Removed by Filter: 999500
-- Planning Time: 0.5 ms
-- Execution Time: 16.0 ms
Read bottom-up:
- Seq Scan on users → filtered 500 out of 1M
- Hash → built hash table for JOIN
- Seq Scan on orders → read 10,000 orders
- Hash Left Join → joined
- HashAggregate → grouped
Scan types
| Type | When | Speed |
|---|---|---|
| Seq Scan | Reading entire table | Slow for large, fast for small |
| Index Scan | Search by index → read row | Fast for small selections |
| Index Only Scan | All data in the index | Very fast |
| Bitmap Index Scan | Index → bitmap → rows | Medium selections (100-10,000 rows) |
Bitmap Index Scan: collects ALL matching TIDs from the index into a bitmap, then reads rows. More efficient than Index Scan for 100-10,000 rows from different locations — fewer random I/O operations.
Common problems
- Seq Scan on a large table
EXPLAIN SELECT * FROM users WHERE email = 'test@example.com'; -- Seq Scan on users ← bad! -- Filter: (email = ...) -- Rows Removed by Filter: 999999 -- Solution: create an index CREATE INDEX idx_users_email ON users(email); - Estimate doesn’t match reality
-- rows=1000 actual rows=1 -- Planner was off by 1000x! -- Cause: stale statistics -- Solution: ANALYZE users; - Spill to disk
EXPLAIN (ANALYZE) SELECT ... GROUP BY ...; -- HashAggregate (actual time=... rows=...) -- Disk: 5GB ← bad! work_mem wasn't enough -- Solution: SET work_mem = '256MB';
Practical examples
-- Check index usage
EXPLAIN SELECT * FROM orders WHERE user_id = 123;
-- Index Scan using idx_orders_user_id → great!
-- Check JOIN
EXPLAIN SELECT * FROM users u JOIN orders o ON u.id = o.user_id;
-- Hash Join → normal for large tables
-- Nested Loop → great if there's an index
-- Check sorting
EXPLAIN SELECT * FROM orders ORDER BY created_at DESC LIMIT 10;
-- Limit → Index Scan Backward → great!
-- Limit → Sort → can be optimized with an index
🔴 Senior Level
Anatomy of node Costs
Sort (cost=150.10..155.10 rows=2000 width=40)
↑ ↑
Startup Total
Cost Cost
Startup Cost: Cost to get the first row
- Critical for
LIMIT 1,WHERE EXISTS - Planner may choose a plan with huge Total Cost but low Startup Cost
Total Cost: Cost to get all rows
- Critical for full retrieval
Arbitrary units:
- 1 unit ≈ reading one 8KB page from disk
- cost=100 ≈ reading 100 pages
BUFFERS: I/O analysis
EXPLAIN (ANALYZE, BUFFERS) SELECT ...;
-- Buffers: shared hit=250 read=50 written=10
-- ↑ ↑ ↑
-- from cache from disk written
-- shared hit: 250 / (250+50) = 83% Cache Hit Rate
-- read=50: 50 × 8KB = 400KB read from disk
-- written=10: 10 × 8KB = 80KB written (temp files?)
Interpretation:
shared hit»shared read→ great (everything in cache)shared read»shared hit→ problem (lots of Disk I/O)written> 0 → spill to disk or temp file writing
Generic vs Custom Plans
The Prepared Statements problem:
-- Custom Plan (knows the specific value)
EXPLAIN SELECT * FROM users WHERE id = 5;
-- Plan: Index Scan (perfect for id=5)
-- Generic Plan (universal for any $1)
PREPARE stmt(int) AS SELECT * FROM users WHERE id = $1;
EXPLAIN EXECUTE stmt(5);
-- Plan: Seq Scan (universal, "average" for all ids)
Why Generic Plan may be worse:
Skewed data:
id=5 → 1 row (Index Scan is ideal)
id=100 → 500,000 rows (Seq Scan is ideal)
Generic Plan: "average" → Seq Scan
→ For id=5, catastrophically slow!
The planner "averages" the cost: for id=5, Index Scan is needed (1 row),
for id=100 — Seq Scan (500,000 rows). Seq Scan is "safer" on average,
but for specific values with small selections — disaster.
Solution:
-- 1. plan_cache_mode
SET plan_cache_mode = 'force_custom_plan'; -- Always Custom
SET plan_cache_mode = 'force_generic_plan'; -- Always Generic
SET plan_cache_mode = 'auto'; -- Default (5 times Custom, then Generic)
-- 2. In JDBC/Hibernate:
-- Check: prepareThreshold=0 (disable Prepared Statements)
JIT Compilation
JIT (Just-In-Time) — PostgreSQL compiles parts of the query into machine code via LLVM. As if the database “rewrote” part of the query in C.
Speeds up:
- Heavy expressions
(a * b + c / d)on millions of rows - Aggregates (SUM, AVG) with many groupings
- Tuple evaluation during Hash Join
Slows down: simple queries (SELECT * FROM t WHERE id = 1) — compilation takes longer than execution!
-- In EXPLAIN ANALYZE, a JIT block may appear
-- JIT:
-- Functions: 5
-- Options: Inlining, Optimization
-- Timing: Min 2.1 ms, Max 2.5 ms
-- Pros: speeds up complex expressions and aggregates
-- Cons: for short queries, compilation > execution!
-- Disable for OLTP
SET jit = off;
-- Check
SHOW jit;
Parallel Query in plans
EXPLAIN SELECT COUNT(*) FROM large_table;
-- Gather
-- Workers Planned: 4
-- → Parallel Seq Scan on large_table
-- Workers Launched: 4
-- Important metrics:
-- Workers Planned: how many the optimizer planned
-- Workers Launched: how many actually started
-- If Launched < Planned → check:
-- max_parallel_workers_per_gather
-- max_parallel_workers
-- Server load
Incremental Sort (PG 13+)
EXPLAIN SELECT * FROM orders ORDER BY user_id, created_at;
-- Incremental Sort ← PG 13+!
-- Sort Key: user_id, created_at
-- Presorted Key: user_id ← data already sorted by user_id
-- → Sorts only by created_at within each user_id
-- → 5-10x faster than a full sort!
On PG 12 and below — full sort of all columns, even if part is already sorted.
Edge Cases
- EXPLAIN ANALYZE executes the query!
-- ⚠️ DML query will be executed! EXPLAIN ANALYZE DELETE FROM users WHERE status = 'INACTIVE'; -- → All inactive users are DELETED! -- ✅ Safe way: BEGIN; EXPLAIN ANALYZE DELETE FROM users WHERE status = 'INACTIVE'; ROLLBACK; -- Undo the changes - First EXPLAIN ANALYZE run may be misleading
Data may not be in cache → first run is slower. Run 2-3 times and look at the second/third result. Or: DISCARD ALL; — clear the cache. - WAL generation (PG 13+)
EXPLAIN (ANALYZE, WAL) INSERT INTO orders SELECT ...; -- WAL: -- Records: 50000 -- Bytes: 10MB -- FPI: 100 (Full Page Images — full copies of 8KB pages in WAL on -- the first page modification after a checkpoint. A large FPI count = -- high load on WAL and replication) -- Critical for estimating replication load
On PG 12 and below, the WAL flag in EXPLAIN is not available. Estimate replication load via
pg_stat_walor logs.
- Timing overhead
-- EXPLAIN ANALYZE adds overhead for time measurements -- For micro-optimizations, disable TIMING EXPLAIN (ANALYZE, TIMING off) SELECT ...; -- Faster, but without detailed per-node measurements
Production Experience
Real scenario #1: Generic Plan kills API
- Application: JDBC Prepared Statement for email search
- In console: 5ms (Custom Plan, Index Scan)
- In application: 500ms (Generic Plan, Seq Scan)
- Cause: email skewed (some users have 1000 records)
- Solution:
plan_cache_mode = 'force_custom_plan' - Result: stable 5ms
Real scenario #2: JIT slows down OLTP
- REST API: simple queries < 1ms
- EXPLAIN ANALYZE: JIT Timing: 8ms
- Cause: JIT enabled by default, compilation > execution
- Solution:
SET jit = offfor the session - Result: 0.5ms instead of 8.5ms (17x faster)
Monitoring
-- 1. Find slow queries
SELECT query, mean_exec_time, calls, shared_blks_read
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;
-- 2. Check a specific query
EXPLAIN (ANALYZE, BUFFERS, WAL)
SELECT ...;
-- 3. Find queries with Generic Plan problems
-- (in application logs: slow, in console: fast)
-- 4. Check JIT
EXPLAIN (ANALYZE) SELECT ...;
-- Look for the "JIT" block and Timing
-- 5. Parallel Query effectiveness
EXPLAIN (ANALYZE) SELECT ...;
-- Workers Launched vs Planned
Best Practices
- Always use
EXPLAIN (ANALYZE, BUFFERS) - Compare Estimated vs Actual rows → discrepancy = statistics problem
- Check Cache Hit Rate via Buffers
- Generic vs Custom — test Prepared Statements
- JIT — disable for OLTP, enable for OLAP
- DML + EXPLAIN ANALYZE → only in a transaction with ROLLBACK
- WAL (PG 13+) for estimating replication load
- SETTINGS (PG 12+) —
EXPLAIN (ANALYZE, SETTINGS)shows modified session parameters. On PG 11 and below — check parameters manually viaSHOW.
Summary for Senior
- Startup Cost is critical for
LIMIT, Total Cost for full retrieval - BUFFERS = the main I/O metric (hit vs read)
- Generic vs Custom Plans — a common cause of prod/dev discrepancies
- plan_cache_mode controls Prepared Statements behavior
- JIT speeds up OLAP, slows down OLTP → disable where not needed
- WAL (PG 13+) shows replication load
- Estimated ≠ Actual rows →
ANALYZEorSET STATISTICS - DML + EXPLAIN ANALYZE → always in
BEGIN...ROLLBACK
🎯 Interview Cheat Sheet
Must know:
- EXPLAIN: shows the plan WITHOUT execution; EXPLAIN ANALYZE: executes + actual time
- Cost:
startup..totalin arbitrary units (1 unit ≈ reading an 8KB page) - BUFFERS:
shared hit(from cache),shared read(from disk),written(write) - Generic Plan (Prepared Statements): universal, may be worse for skewed data
- Custom Plan: knows the specific value, better for skewed data
- plan_cache_mode:
auto(5 times Custom → Generic),force_custom_plan,force_generic_plan - JIT: speeds up complex expressions/aggregates, slows down OLTP (< 1ms queries)
- Parallel Query: Workers Planned vs Launched → if Launched < Planned, check settings
- Incremental Sort (PG 13+): data partially sorted → sort the rest
- WAL (PG 13+): Records, Bytes, FPI → estimate replication load
- DML + EXPLAIN ANALYZE → EXECUTES the query! Only in BEGIN…ROLLBACK
Frequent follow-up questions:
- “Why is it slower in the application than in the console?” → Generic Plan in Prepared Statements
- “How to check Cache Hit Rate?” → BUFFERS: hit / (hit + read)
- “Why does JIT slow things down?” → Compilation 8ms > execution 1ms → disable for OLTP
- “What does Estimated rows ≠ Actual mean?” → Stale statistics → ANALYZE
Red flags (do NOT say):
- ❌ “EXPLAIN ANALYZE is safe for DML” (it EXECUTES the query!)
- ❌ “Generic Plan is always good” (for skewed data — disaster)
- ❌ “JIT always speeds up” (it slows down OLTP!)
- ❌ “Cost is milliseconds” (no, arbitrary units!)
Related topics:
- [[How to optimize slow queries]] → practical application of EXPLAIN
- [[Why is ANALYZE needed]] → Estimated ≠ Actual → ANALYZE
- [[What is a correlated subquery]] → finding SubPlan in the plan