Question 21 · Section 1

How to optimize slow queries?

4. Fix it (most often — add an index)

Language versions: English Russian Ukrainian

🟢 Junior Level

Query optimization is the process of speeding up slow SQL queries.

Simple algorithm:

  1. Find the slow query
  2. Look at its plan via EXPLAIN
  3. Understand what’s slowing it down
  4. Fix it (most often — add an index)

Example:

-- 1. Find the slow query
-- (application logs, pg_stat_statements)

-- 2. Check the plan
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'test@example.com';

-- 3. We see the problem
-- Seq Scan on users  ← reads the entire table!
--   Rows Removed by Filter: 999999

-- 4. Fix it
CREATE INDEX idx_users_email ON users(email);

-- Now:
-- Index Scan using idx_users_email  ← fast!
-- Execution Time: 0.05 ms (was 500 ms)

Most common causes of slow queries:

  • ❌ No index on the field in WHERE
  • ❌ Index exists but isn’t used (function in WHERE)
  • ❌ Selection is too large (no LIMIT)
  • ❌ JOIN without indexes on join fields

🟡 Middle Level

Optimization algorithm

Step 1: Find problematic queries

-- pg_stat_statements — the main tool
CREATE EXTENSION pg_stat_statements;

-- Top 10 slowest queries
SELECT query, calls, mean_exec_time, total_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

-- Top 10 queries with the most disk reads
SELECT query, shared_blks_read, mean_exec_time
FROM pg_stat_statements
ORDER BY shared_blks_read DESC
LIMIT 10;

Step 2: Analyze the plan

EXPLAIN (ANALYZE, BUFFERS)
SELECT ...;

-- Look for:
-- Seq Scan on large tables → need an index
-- Sort → Disk Spill → increase work_mem
-- Nested Loop on large tables → bad plan
-- rows ≠ actual rows → stale statistics

Step 3: Fix

Common problems and solutions

1. No index

-- ❌ Seq Scan on 1M rows
SELECT * FROM users WHERE email = 'test@example.com';

-- ✅ Create an index
CREATE INDEX idx_users_email ON users(email);

2. Function in WHERE kills the index

-- ❌ Index is not used
SELECT * FROM users WHERE LOWER(email) = 'test@example.com';

-- ✅ Functional index
CREATE INDEX idx_users_email_lower ON users(LOWER(email));

3. Wrong data type

-- ❌ Implicit cast blocks the index
SELECT * FROM users WHERE phone = 89001234567;  -- phone = text

-- ✅ Correct type
SELECT * FROM users WHERE phone = '89001234567';

4. Not enough work_mem

-- ❌ Sort on disk
EXPLAIN ANALYZE SELECT * FROM orders ORDER BY created_at DESC;
-- Sort Method: external merge  Disk: 5000kB

-- ✅ Increase work_mem
SET work_mem = '256MB';
-- Now: Sort Method: quicksort  Memory: 50000kB

5. JOIN without indexes

-- ❌ Hash Join without indexes
SELECT * FROM users u JOIN orders o ON u.id = o.user_id;

-- ✅ Index on the join field
CREATE INDEX idx_orders_user_id ON orders(user_id);
-- Now: Nested Loop + Index Scan → faster

Common mistakes

  1. Too many indexes
    • Every index slows down INSERT/UPDATE
    • Remove unused ones: idx_scan = 0
  2. SELECT *
    • Select only the columns you need
    • Less data → faster transfer
  3. N+1 problem
    -- ❌ 1000 queries in a loop
    for user in users:
        SELECT * FROM orders WHERE user_id = user.id
    
    -- ✅ 1 query with JOIN
    SELECT u.*, o.* FROM users u LEFT JOIN orders o ON u.id = o.user_id
    

🔴 Senior Level

Systematic approach to optimization

Optimization levels:

1. SQL Level (query)
   ├── Indexes
   ├── Query refactoring
   └── Statistics

2. Configuration
   ├── work_mem
   ├── shared_buffers
   ├── effective_cache_size
   └── random_page_cost

3. Architecture
   ├── Denormalization
   ├── Partitioning
   ├── Materialized Views
   └── Read Replicas

4. Infrastructure
   ├── SSD vs HDD
   ├── RAM
   ├── Connection Pooling
   └── Huge Pages

Plan Cache problems

Generic vs Custom Plans:

-- Problem: query is fast in psql, slow in the application
-- Cause: JDBC uses Prepared Statements → Generic Plan

-- Diagnostics
EXPLAIN (ANALYZE) SELECT * FROM users WHERE email = $1;
-- Generic Plan: Seq Scan

-- Solution 1: plan_cache_mode
SET plan_cache_mode = 'force_custom_plan';

-- Solution 2: disable Prepared Statements
-- JDBC URL: jdbc:postgresql://...?prepareThreshold=0

Implicit Casts

-- Check implicit type casts
EXPLAIN (ANALYZE) SELECT * FROM users WHERE phone = 89001234567;
-- Filter: (phone = '89001234567'::text)  ← cast!
-- → Index is not used

-- Solution: matching types
-- 1. Change the column type
ALTER TABLE users ALTER COLUMN phone TYPE bigint;

-- 2. Or pass the correct type
SELECT * FROM users WHERE phone = '89001234567';  -- text

Function volatility

-- Functions are marked as:
-- IMMUTABLE  → always the same result for the same arguments
-- STABLE     → same result within a query
-- VOLATILE   → can change (default)

-- ❌ VOLATILE function prevents optimization
CREATE OR REPLACE FUNCTION get_status(id int) RETURNS text AS $$
  SELECT status FROM users WHERE id = $1;
$$ LANGUAGE sql;  -- VOLATILE by default

-- ✅ STABLE allows caching
CREATE OR REPLACE FUNCTION get_status(id int) RETURNS text AS $$
  SELECT status FROM users WHERE id = $1;
$$ LANGUAGE sql STABLE;

-- STABLE: within a SINGLE query, the function returns the same result
-- for the same arguments. The planner calls it once and substitutes.
-- Cache lives until the end of the query.

Architectural decisions

Denormalization:

-- ❌ JOIN 10 tables for every query
SELECT u.name, o.amount, p.title, c.name as category
FROM users u
JOIN orders o ON u.id = o.user_id
JOIN products p ON o.product_id = p.id
JOIN categories c ON p.category_id = c.id
WHERE ...

-- ✅ Denormalized columns
ALTER TABLE orders ADD COLUMN product_title text;
ALTER TABLE orders ADD COLUMN category_name text;

-- Update via trigger
-- Query: without JOIN → 10x faster

Materialized Views:

-- For complex analytical queries
CREATE MATERIALIZED VIEW mv_sales_summary AS
SELECT
    department,
    COUNT(*) as orders_count,
    SUM(amount) as total_sales,
    AVG(amount) as avg_order
FROM orders
GROUP BY department;

-- Refresh periodically
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_sales_summary;

-- Query: instant
SELECT * FROM mv_sales_summary;

Partitioning:

-- For tables > 100M rows
CREATE TABLE orders (
    id serial,
    created_at date,
    amount numeric
) PARTITION BY RANGE (created_at);

CREATE TABLE orders_y2023 PARTITION OF orders
    FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');

CREATE TABLE orders_y2024 PARTITION OF orders
    FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');

-- Partition Pruning: the planner excludes unnecessary
-- partitions from the plan. Data from them is not even read from disk.
-- WHERE created_at > '2024-01-01' → scans only orders_y2024

System tuning

Parameters for SSD:

-- The planner should trust the cache
SET random_page_cost = 1.1;       -- Default 4.0 (for HDD)
SET effective_cache_size = '24GB'; -- 75% RAM
SET shared_buffers = '8GB';       -- 25% RAM

Connection Pooling:

pgBouncer in transaction mode:
  - 1000 app connections → 50 connections to DB
  - 30% CPU saved on context switching
  - Less memory for sessions

Huge Pages:

OS + PostgreSQL:
  - Huge Pages = 2MB pages instead of 4KB
  - Fewer TLB misses
    (TLB — Translation Lookaside Buffer, CPU cache for addresses.
    At 4KB for 8GB RAM → 2M TLB entries (doesn't fit).
    At 2MB → 4000 entries (fits in TLB))
  - +10-20% performance at high loads

Senior developer algorithm

1. pg_stat_statements → find queries with high shared_blks_read
2. EXPLAIN (ANALYZE, BUFFERS) → find the bottleneck node
3. Check locks: pg_blocking_pids()
4. Check statistics: rows ≠ actual → ANALYZE
5. Check work_mem: Disk Spill → increase
6. Check plan_cache_mode: Generic Plan problems
7. Refactoring: denormalization, partitioning, caching
8. If SQL is optimal → sharding (Citus)

When NOT to use architectural decisions

  • Denormalization: Do NOT use with frequent UPDATEs of denormalized fields (trigger conflicts, desynchronization)
  • Materialized Views: Do NOT use for data that must be up-to-date in real time (lag between REFRESHes)
  • Partitioning: Do NOT use for tables < 10M rows — partition management overhead outweighs the benefit

Production Experience

Real scenario #1: Comprehensive optimization

  • SaaS: API > 2 seconds (p95)
  • Analysis:
    1. N+1 queries (100 queries per page)
    2. Seq Scan on a 50M row table
    3. work_mem = 4MB → spill to disk
    4. Generic Plan in Hibernate
  • Solution:
    1. DataLoader → 5 queries instead of 100
    2. Index on hot fields
    3. work_mem = 256MB
    4. plan_cache_mode = force_custom_plan
  • Result: p95 < 200ms (10x faster)

Real scenario #2: Architectural changes

  • Analytics: report 5 minutes (JOIN 12 tables)
  • Solution:
    • Materialized View with REFRESH every 5 minutes
    • Denormalized columns for hot fields
    • Partitioning by date
  • Result: report < 1 second

Monitoring Dashboard

-- 1. Top slow queries
SELECT
    query,
    calls,
    mean_exec_time,
    total_exec_time,
    shared_blks_read,
    rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

-- 2. Locks
SELECT
    blocked_locks.pid AS blocked_pid,
    blocked_activity.query AS blocked_query,
    blocking_locks.pid AS blocking_pid,
    blocking_activity.query AS blocking_query
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;

-- 3. Bloat
SELECT
    relname,
    n_dead_tup,
    n_live_tup,
    ROUND(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 2) as dead_pct
FROM pg_stat_user_tables
WHERE n_live_tup > 10000
ORDER BY dead_pct DESC;

-- 4. Cache Hit Rate
SELECT
    sum(heap_blks_read) as heap_read,
    sum(heap_blks_hit) as heap_hit,
    ROUND(100.0 * sum(heap_blks_hit) / NULLIF(sum(heap_blks_read) + sum(heap_blks_hit), 0), 2) as cache_hit_pct
FROM pg_statio_user_tables;
-- > 99% → great

Best Practices

  1. Monitor via pg_stat_statements constantly
  2. EXPLAIN (ANALYZE, BUFFERS) for every problematic query
  3. Indexes — the first solution, but not the only one
  4. Control work_mem during Sort/Hash
  5. Check plan_cache_mode when using ORM
  6. Denormalization for hot-path queries
  7. Materialized Views for analytics
  8. pgBouncer recommended at > 50 connections. For small projects (< 10 users) you can do without it.
  9. Partitioning for tables > 100M rows
  10. Cache Hit Ratio > 99% — the goal

Summary for Senior

  • Optimization is not just indexes, it’s a systematic process
  • Plan Cache (Generic vs Custom) — a common cause of prod/dev discrepancies
  • Implicit Casts block indexes → use matching types
  • Denormalization for hot-path, MV for analytics
  • Control work_mem → Disk Spill = disaster
  • pgBouncer in transaction mode → 30% CPU saved
  • Cache Hit Ratio > 99% → the main DB health metric
  • Partitioning + Partition Pruning for large tables
  • Write Amplification: every INSERT/UPDATE writes not only to the table but also to ALL indexes. 10 indexes = 10x writes. Physically 10x more data is written than the user changed.

🎯 Interview Cheat Sheet

Must know:

  • Algorithm: pg_stat_statements → EXPLAIN (ANALYZE, BUFFERS) → fix → verify
  • Common problems: no index, function in WHERE, implicit cast, disk spill, N+1
  • Generic vs Custom Plans: Prepared Statements → plan_cache_mode = force_custom_plan
  • Implicit Casts: phone = 8900... (int) vs phone = '8900...' (text) → cast blocks index
  • Function volatility: IMMUTABLE, STABLE, VOLATILE → affects optimization
  • Denormalization: hot columns → trigger for updates
  • Materialized Views: complex analytical queries → REFRESH CONCURRENTLY
  • Partitioning: tables > 100M rows → Partition Pruning
  • Connection Pooling: pgBouncer in transaction mode → 1000 connections → 50 to DB
  • System tuning: random_page_cost = 1.1 (SSD), shared_buffers = 25% RAM
  • Cache Hit Ratio > 99% — the goal; < 95% → problem

Frequent follow-up questions:

  • “How to find slow queries?” → pg_stat_statements ORDER BY total_exec_time DESC
  • “Why is a query fast in console but slow in the application?” → Generic Plan
  • “What to do on Disk Spill?” → SET work_mem = ‘256MB’ for the session
  • “When is denormalization justified?” → Hot-path queries, JOIN 10+ tables

Red flags (do NOT say):

  • ❌ “Optimization = only indexes” (systematic process: SQL, config, architecture, infrastructure)
  • ❌ “plan_cache_mode doesn’t matter” (Generic Plan — a common cause of discrepancies)
  • ❌ “Materialized Views for OLTP” (only for analytics!)
  • ❌ “pgBouncer isn’t needed” (1000 connections = 30% CPU on context switching)

Related topics:

  • [[Why are indexes needed]] → the first solution for optimization
  • [[What is explain plan]] → EXPLAIN (ANALYZE, BUFFERS)
  • [[Why is ANALYZE needed]] → statistics for the planner
  • [[How does MVCC work in PostgreSQL]] → Bloat and long transactions