What is Explain plan?

🟢 Junior Level

EXPLAIN — a PostgreSQL command that shows how the database plans to execute a query, without actually running it.

Simple analogy: Before a trip, GPS shows the route: which roads, how many turns, estimated time. EXPLAIN is a GPS for a query: shows which indexes, which tables, how long it will take.

Example:

-- Show query plan
EXPLAIN SELECT * FROM users WHERE email = 'test@example.com';

-- Result:
-- Index Scan using idx_users_email on users  (cost=0.42..8.44 rows=1 width=100)
--   Index Cond: (email = 'test@example.com'::text)

What the numbers mean:

cost=0.42..8.44 — cost: startup..total (to get all rows)
rows=1 — how many rows the database expects
width=100 — average row size in bytes

When to use:

Query runs slowly
You want to check if an index is being used
Before creating an index — will it help?

🟡 Middle Level

EXPLAIN vs EXPLAIN ANALYZE

-- EXPLAIN: plan only, does NOT execute the query
EXPLAIN SELECT * FROM users WHERE email = 'test@example.com';

-- EXPLAIN ANALYZE: executes the query + shows actual time
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'test@example.com';

-- Result:
-- Index Scan using idx_users_email on users  (cost=0.42..8.44 rows=1 width=100)
--   (actual time=0.035..0.037 rows=1 loops=1)
--                     ↑ actual time in ms
-- Planning Time: 0.123 ms
-- Execution Time: 0.058 ms

Key flags

-- Full set for analysis
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT ...;

-- Flags:
-- ANALYZE  → execute the query + actual time
-- BUFFERS  → show cache/disk reads
-- COSTS    → show cost (on by default)
-- ROWS     → show estimated row count
-- WIDTH    → show row size
-- FORMAT   → TEXT (default), JSON, YAML, XML

Reading a plan

EXPLAIN (ANALYZE, BUFFERS)
SELECT u.name, COUNT(o.id) as orders
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.city = 'Moscow'
GROUP BY u.name;

-- Plan:
-- HashAggregate  (cost=... rows=... actual time=15.2..15.3 rows=500 loops=1)
--   Buffers: shared hit=250 read=50
--   → Hash Left Join  (cost=... actual time=10.1..12.5 rows=2000 loops=1)
--        Buffers: shared hit=200 read=40
--        Hash Cond: (o.user_id = u.id)
--        → Seq Scan on orders o  (actual time=0.01..5.2 rows=10000 loops=1)
--             Buffers: shared hit=150
--        → Hash  (cost=... actual time=3.1..3.1 rows=500 loops=1)
--             Buffers: shared hit=50 read=40
--             → Seq Scan on users u  (actual time=0.02..2.8 rows=500 loops=1)
--                  Filter: (city = 'Moscow')
--                  Rows Removed by Filter: 999500
-- Planning Time: 0.5 ms
-- Execution Time: 16.0 ms

Read bottom-up:

Seq Scan on users → filtered 500 out of 1M
Hash → built hash table for JOIN
Seq Scan on orders → read 10,000 orders
Hash Left Join → joined
HashAggregate → grouped

Scan types

Type	When	Speed
Seq Scan	Reading entire table	Slow for large, fast for small
Index Scan	Search by index → read row	Fast for small selections
Index Only Scan	All data in the index	Very fast
Bitmap Index Scan	Index → bitmap → rows	Medium selections (100-10,000 rows)

Bitmap Index Scan: collects ALL matching TIDs from the index into a bitmap, then reads rows. More efficient than Index Scan for 100-10,000 rows from different locations — fewer random I/O operations.

Common problems

Seq Scan on a large table

EXPLAIN SELECT * FROM users WHERE email = 'test@example.com';
-- Seq Scan on users  ← bad!
--   Filter: (email = ...)
--   Rows Removed by Filter: 999999

-- Solution: create an index
CREATE INDEX idx_users_email ON users(email);

Estimate doesn’t match reality

-- rows=1000 actual rows=1
-- Planner was off by 1000x!

-- Cause: stale statistics
-- Solution: ANALYZE users;

Spill to disk

EXPLAIN (ANALYZE) SELECT ... GROUP BY ...;
-- HashAggregate (actual time=... rows=...)
--   Disk: 5GB  ← bad! work_mem wasn't enough

-- Solution: SET work_mem = '256MB';

Practical examples

-- Check index usage
EXPLAIN SELECT * FROM orders WHERE user_id = 123;
-- Index Scan using idx_orders_user_id → great!

-- Check JOIN
EXPLAIN SELECT * FROM users u JOIN orders o ON u.id = o.user_id;
-- Hash Join → normal for large tables
-- Nested Loop → great if there's an index

-- Check sorting
EXPLAIN SELECT * FROM orders ORDER BY created_at DESC LIMIT 10;
-- Limit → Index Scan Backward → great!
-- Limit → Sort → can be optimized with an index

🔴 Senior Level

Anatomy of node Costs

Sort (cost=150.10..155.10 rows=2000 width=40)
         ↑         ↑
    Startup    Total
    Cost       Cost

Startup Cost: Cost to get the first row

Critical for LIMIT 1, WHERE EXISTS
Planner may choose a plan with huge Total Cost but low Startup Cost

Total Cost: Cost to get all rows

Critical for full retrieval

Arbitrary units:

1 unit ≈ reading one 8KB page from disk
cost=100 ≈ reading 100 pages

BUFFERS: I/O analysis

EXPLAIN (ANALYZE, BUFFERS) SELECT ...;

-- Buffers: shared hit=250 read=50 written=10
--               ↑       ↑         ↑
--            from cache  from disk  written

-- shared hit: 250 / (250+50) = 83% Cache Hit Rate
-- read=50: 50 × 8KB = 400KB read from disk
-- written=10: 10 × 8KB = 80KB written (temp files?)

Interpretation:

shared hit » shared read → great (everything in cache)
shared read » shared hit → problem (lots of Disk I/O)
written > 0 → spill to disk or temp file writing

Generic vs Custom Plans

The Prepared Statements problem:

-- Custom Plan (knows the specific value)
EXPLAIN SELECT * FROM users WHERE id = 5;
-- Plan: Index Scan (perfect for id=5)

-- Generic Plan (universal for any $1)
PREPARE stmt(int) AS SELECT * FROM users WHERE id = $1;
EXPLAIN EXECUTE stmt(5);
-- Plan: Seq Scan (universal, "average" for all ids)

Why Generic Plan may be worse:

Skewed data:
  id=5 → 1 row (Index Scan is ideal)
  id=100 → 500,000 rows (Seq Scan is ideal)

Generic Plan: "average" → Seq Scan
→ For id=5, catastrophically slow!

The planner "averages" the cost: for id=5, Index Scan is needed (1 row),
for id=100 — Seq Scan (500,000 rows). Seq Scan is "safer" on average,
but for specific values with small selections — disaster.

Solution:

-- 1. plan_cache_mode
SET plan_cache_mode = 'force_custom_plan';  -- Always Custom
SET plan_cache_mode = 'force_generic_plan'; -- Always Generic
SET plan_cache_mode = 'auto';               -- Default (5 times Custom, then Generic)

-- 2. In JDBC/Hibernate:
-- Check: prepareThreshold=0 (disable Prepared Statements)

JIT Compilation

JIT (Just-In-Time) — PostgreSQL compiles parts of the query into machine code via LLVM. As if the database “rewrote” part of the query in C.

Speeds up:

Heavy expressions (a * b + c / d) on millions of rows
Aggregates (SUM, AVG) with many groupings
Tuple evaluation during Hash Join

Slows down: simple queries (SELECT * FROM t WHERE id = 1) — compilation takes longer than execution!

-- In EXPLAIN ANALYZE, a JIT block may appear
-- JIT:
--   Functions: 5
--   Options: Inlining, Optimization
--   Timing: Min 2.1 ms, Max 2.5 ms

-- Pros: speeds up complex expressions and aggregates
-- Cons: for short queries, compilation > execution!

-- Disable for OLTP
SET jit = off;

-- Check
SHOW jit;

Parallel Query in plans

EXPLAIN SELECT COUNT(*) FROM large_table;

-- Gather
--   Workers Planned: 4
--   → Parallel Seq Scan on large_table
--        Workers Launched: 4

-- Important metrics:
-- Workers Planned: how many the optimizer planned
-- Workers Launched: how many actually started
-- If Launched < Planned → check:
--   max_parallel_workers_per_gather
--   max_parallel_workers
--   Server load

Incremental Sort (PG 13+)

EXPLAIN SELECT * FROM orders ORDER BY user_id, created_at;

-- Incremental Sort  ← PG 13+!
--   Sort Key: user_id, created_at
--   Presorted Key: user_id  ← data already sorted by user_id
--   → Sorts only by created_at within each user_id
--   → 5-10x faster than a full sort!

On PG 12 and below — full sort of all columns, even if part is already sorted.

Edge Cases

EXPLAIN ANALYZE executes the query!

-- ⚠️ DML query will be executed!
EXPLAIN ANALYZE DELETE FROM users WHERE status = 'INACTIVE';
-- → All inactive users are DELETED!

-- ✅ Safe way:
BEGIN;
EXPLAIN ANALYZE DELETE FROM users WHERE status = 'INACTIVE';
ROLLBACK;  -- Undo the changes

First EXPLAIN ANALYZE run may be misleading

Data may not be in cache → first run is slower.
Run 2-3 times and look at the second/third result.
Or: DISCARD ALL; — clear the cache.

WAL generation (PG 13+)

EXPLAIN (ANALYZE, WAL) INSERT INTO orders SELECT ...;

-- WAL:
--   Records: 50000
--   Bytes: 10MB
--   FPI: 100 (Full Page Images — full copies of 8KB pages in WAL on
--   the first page modification after a checkpoint. A large FPI count =
--   high load on WAL and replication)

-- Critical for estimating replication load

On PG 12 and below, the WAL flag in EXPLAIN is not available. Estimate replication load via pg_stat_wal or logs.

Timing overhead

-- EXPLAIN ANALYZE adds overhead for time measurements
-- For micro-optimizations, disable TIMING

EXPLAIN (ANALYZE, TIMING off) SELECT ...;
-- Faster, but without detailed per-node measurements

Production Experience

Real scenario #1: Generic Plan kills API

Application: JDBC Prepared Statement for email search
In console: 5ms (Custom Plan, Index Scan)
In application: 500ms (Generic Plan, Seq Scan)
Cause: email skewed (some users have 1000 records)
Solution: plan_cache_mode = 'force_custom_plan'
Result: stable 5ms

Real scenario #2: JIT slows down OLTP

REST API: simple queries < 1ms
EXPLAIN ANALYZE: JIT Timing: 8ms
Cause: JIT enabled by default, compilation > execution
Solution: SET jit = off for the session
Result: 0.5ms instead of 8.5ms (17x faster)

Monitoring

-- 1. Find slow queries
SELECT query, mean_exec_time, calls, shared_blks_read
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;

-- 2. Check a specific query
EXPLAIN (ANALYZE, BUFFERS, WAL)
SELECT ...;

-- 3. Find queries with Generic Plan problems
-- (in application logs: slow, in console: fast)

-- 4. Check JIT
EXPLAIN (ANALYZE) SELECT ...;
-- Look for the "JIT" block and Timing

-- 5. Parallel Query effectiveness
EXPLAIN (ANALYZE) SELECT ...;
-- Workers Launched vs Planned

Best Practices

Always use EXPLAIN (ANALYZE, BUFFERS)
Compare Estimated vs Actual rows → discrepancy = statistics problem
Check Cache Hit Rate via Buffers
Generic vs Custom — test Prepared Statements
JIT — disable for OLTP, enable for OLAP
DML + EXPLAIN ANALYZE → only in a transaction with ROLLBACK
WAL (PG 13+) for estimating replication load
SETTINGS (PG 12+) — EXPLAIN (ANALYZE, SETTINGS) shows modified session parameters. On PG 11 and below — check parameters manually via SHOW.

Summary for Senior

Startup Cost is critical for LIMIT, Total Cost for full retrieval
BUFFERS = the main I/O metric (hit vs read)
Generic vs Custom Plans — a common cause of prod/dev discrepancies
plan_cache_mode controls Prepared Statements behavior
JIT speeds up OLAP, slows down OLTP → disable where not needed
WAL (PG 13+) shows replication load
Estimated ≠ Actual rows → ANALYZE or SET STATISTICS
DML + EXPLAIN ANALYZE → always in BEGIN...ROLLBACK

🎯 Interview Cheat Sheet

Must know:

EXPLAIN: shows the plan WITHOUT execution; EXPLAIN ANALYZE: executes + actual time
Cost: startup..total in arbitrary units (1 unit ≈ reading an 8KB page)
BUFFERS: shared hit (from cache), shared read (from disk), written (write)
Generic Plan (Prepared Statements): universal, may be worse for skewed data
Custom Plan: knows the specific value, better for skewed data
plan_cache_mode: auto (5 times Custom → Generic), force_custom_plan, force_generic_plan
JIT: speeds up complex expressions/aggregates, slows down OLTP (< 1ms queries)
Parallel Query: Workers Planned vs Launched → if Launched < Planned, check settings
Incremental Sort (PG 13+): data partially sorted → sort the rest
WAL (PG 13+): Records, Bytes, FPI → estimate replication load
DML + EXPLAIN ANALYZE → EXECUTES the query! Only in BEGIN…ROLLBACK

Frequent follow-up questions:

“Why is it slower in the application than in the console?” → Generic Plan in Prepared Statements
“How to check Cache Hit Rate?” → BUFFERS: hit / (hit + read)
“Why does JIT slow things down?” → Compilation 8ms > execution 1ms → disable for OLTP
“What does Estimated rows ≠ Actual mean?” → Stale statistics → ANALYZE

Red flags (do NOT say):

❌ “EXPLAIN ANALYZE is safe for DML” (it EXECUTES the query!)
❌ “Generic Plan is always good” (for skewed data — disaster)
❌ “JIT always speeds up” (it slows down OLTP!)
❌ “Cost is milliseconds” (no, arbitrary units!)

Related topics:

[[How to optimize slow queries]] → practical application of EXPLAIN
[[Why is ANALYZE needed]] → Estimated ≠ Actual → ANALYZE
[[What is a correlated subquery]] → finding SubPlan in the plan