Question 8 · Section 1

Difference between INNER JOIN and LEFT JOIN?

This gives maximum freedom for choosing the optimal plan.

Language versions: English Russian Ukrainian

🟢 Junior Level

The main difference: INNER JOIN returns only rows with matches in both tables, while LEFT JOIN returns all rows from the left table, even if there are no matches.

Simple example:

-- Table users: Ivanov, Petrov, Sidorov
-- Table orders: Ivanov (order), Petrov (order)

-- INNER JOIN: only users with orders
SELECT u.name, o.amount
FROM users u
INNER JOIN orders o ON u.id = o.user_id;
-- Result: Ivanov, Petrov

-- LEFT JOIN: all users, even without orders
SELECT u.name, o.amount
FROM users u
LEFT JOIN orders o ON u.id = o.user_id;
-- Result: Ivanov (100), Petrov (200), Sidorov (NULL)

When to use:

  • INNER JOIN — when you only need related data
  • LEFT JOIN — when you need all records from the main table + possible related ones

🟡 Middle Level

Join Reordering freedom

INNER JOIN — associative and commutative:

-- The planner can join tables in any order:
(A JOIN B) JOIN C = A JOIN (B JOIN C) = (A JOIN C) JOIN B

This gives maximum freedom for choosing the optimal plan.

LEFT JOIN — NOT associative:

-- Order is strictly constrained:
(A LEFT JOIN B) LEFT JOIN C  A LEFT JOIN (B LEFT JOIN C)

The planner is restricted in reordering → may choose a suboptimal plan.

Physical strategies (how PostgreSQL performs JOINs)

PostgreSQL uses three algorithms for physical JOIN execution:

  • Nested Loop — for each row of the outer table, looks for matches in the inner table (fast if there’s an index)
  • Hash Join — builds a hash table on the smaller table and looks for matches (fast for large tables without an index)
  • Merge Join — both tables are sorted and merged like two sorted arrays (fast if data is already sorted)

Physical strategies comparison

Metric INNER JOIN LEFT JOIN
Hash Join Hash built on smaller table Hash, typically, on the right table
Nested Loop Either table can be outer Left table is typically outer
Merge Join Both tables can be sorted Left table determines the order

The WHERE trap (Outer-to-Inner Transformation)

PostgreSQL can convert LEFT JOIN to INNER JOIN “on the fly”:

-- ❌ LEFT JOIN turned into INNER JOIN!
SELECT u.name, o.amount
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE o.amount > 100;
-- NULL > 100 = UNKNOWN → all users without orders are filtered out!

-- ✅ Correct: filter in ON
SELECT u.name, o.amount
FROM users u
LEFT JOIN orders o ON u.id = o.user_id AND o.amount > 100;
-- All users are preserved, orders are filtered

-- ✅ Or filter on IS NULL
SELECT u.name
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE o.id IS NULL;  -- Users WITHOUT orders (Anti-Join)

Common mistakes

  1. Confusion with NULL
    -- LEFT JOIN returns NULL for missing data
    SELECT u.name, SUM(o.amount)
    FROM users u
    LEFT JOIN orders o ON u.id = o.user_id
    GROUP BY u.name;
    
    -- NULL + number = NULL!
    -- Solution: COALESCE
    SELECT u.name, COALESCE(SUM(o.amount), 0) as total
    FROM users u
    LEFT JOIN orders o ON u.id = o.user_id
    GROUP BY u.name;
    
  2. Row duplication
    -- ❌ If a user has 5 orders, they appear 5 times!
    SELECT u.name, u.email
    FROM users u
    LEFT JOIN orders o ON u.id = o.user_id;
    
    -- ✅ Group or use EXISTS
    SELECT DISTINCT u.name, u.email  -- or GROUP BY
    FROM users u
    LEFT JOIN orders o ON u.id = o.user_id;
    
  3. LEFT JOIN where INNER is needed
    -- ❌ LEFT JOIN is slower (less freedom for the planner)
    SELECT u.name, o.amount
    FROM users u
    LEFT JOIN orders o ON u.id = o.user_id
    WHERE o.amount IS NOT NULL;  -- Essentially an INNER JOIN
    
    -- ✅ Use INNER JOIN
    SELECT u.name, o.amount
    FROM users u
    INNER JOIN orders o ON u.id = o.user_id;
    

Practical comparison

-- INNER JOIN: only orders with customers
SELECT u.name, COUNT(o.id) as orders_count
FROM users u
INNER JOIN orders o ON u.id = o.user_id
GROUP BY u.name;
-- Only customers with orders

-- LEFT JOIN: all customers + statistics
SELECT u.name, COUNT(o.id) as orders_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.name;
-- All customers, some with orders_count = 0

🔴 Senior Level

When NOT to use LEFT JOIN

  1. When data is guaranteed to exist in both tables — use INNER JOIN, it gives the optimizer more freedom
  2. When ORM generates LEFT JOINs “just in case” — check if you actually need data from the right table; if not, Outer Join Removal will remove the JOIN, but it’s better not to generate it in the first place
  3. Cascades of 5+ LEFT JOINs — the planner is restricted in reordering, may choose a catastrophically bad plan

Join Reordering: Deep analysis

INNER JOIN gives the optimizer full freedom:

Query: A JOIN B JOIN C JOIN D (4 tables)

Possible orders: 4! = 24 variants
The planner picks the cheapest one.

LEFT JOIN restricts:

Query: A LEFT JOIN B LEFT JOIN C LEFT JOIN D

Possible orders: only 1 (A → B → C → D)
The planner can only choose physical strategies.

Performance impact:

  • On complex schemas (10+ tables), chains of LEFT JOINs can lead to a catastrophically bad plan
  • Solution: Use INNER JOIN where possible, LEFT JOIN only where semantically necessary

Outer Join Removal (PG 10+)

Powerful optimization — PostgreSQL removes unnecessary LEFT JOINs:

Removal conditions:

  1. You don’t select any columns from the right table
  2. The key in the right table is UNIQUE or PRIMARY KEY
-- LEFT JOIN will be removed from the plan!
SELECT u.name FROM users u
LEFT JOIN orders o ON u.id = o.user_id;
-- orders is not used in SELECT + user_id is UNIQUE → removal

-- Check via EXPLAIN
EXPLAIN SELECT u.name FROM users u
LEFT JOIN orders o ON u.id = o.user_id;
-- Seq Scan on users  ← orders is NOT in the plan!

Why this happens:

  • Often ORMs or generic Views generate LEFT JOINs “just in case”
  • The optimizer removes unnecessary work

Outer-to-Inner Transformation

The optimizer converts LEFT JOIN to INNER JOIN if WHERE makes NULLs impossible:

-- Original query
SELECT u.name, o.amount
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE o.amount > 100;

-- After transformation (internally)
SELECT u.name, o.amount
FROM users u
INNER JOIN orders o ON u.id = o.user_id
WHERE o.amount > 100;

-- Why? NULL > 100 = UNKNOWN → rows with NULL would be filtered anyway
-- So LEFT JOIN is semantically equivalent to INNER JOIN

Check:

-- See transformations
EXPLAIN (VERBOSE)
SELECT u.name, o.amount
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE o.amount > 100;
-- In the output you'll see: "Join Type: Inner"

Cardinality Estimation

INNER JOIN:

rows = size(A) × size(B) × selectivity(join_condition)

Example:
- users: 1,000,000 rows
- orders: 10,000,000 rows
- selectivity: 0.001 (0.1%)

rows = 1,000,000 × 10,000,000 × 0.001 = 10,000,000,000 × 0.001 = 10,000,000

⚠️ A statistics error can lead to estimates 10x higher/lower!

LEFT JOIN:

rows >= size(left table)

The planner knows the result will be at least as large as the left table.
This makes plans more predictable but less flexible.

LEFT JOIN LATERAL

A powerful tool for complex sampling:

-- "Find top 3 orders for each user"
SELECT u.name, top_orders.amount, top_orders.created_at
FROM users u
LEFT JOIN LATERAL (
    SELECT amount, created_at
    FROM orders o
    WHERE o.user_id = u.id
    ORDER BY created_at DESC
    LIMIT 3
) top_orders ON true;

-- Physical plan: Nested Loop
-- For each user → 3 orders from the index
-- Great for users < 10,000
-- Slow for users > 1,000,000

Alternative for large data:

-- Window Function instead of LATERAL
SELECT name, amount, created_at
FROM (
    SELECT u.name, o.amount, o.created_at,
           ROW_NUMBER() OVER(PARTITION BY u.id ORDER BY o.created_at DESC) as rn
    FROM users u
    INNER JOIN orders o ON u.id = o.user_id
) sub
WHERE rn <= 3;

-- Plan: Hash Join + WindowAgg
-- Much faster for large tables!

Edge Cases

  1. Mixing INNER and LEFT JOIN
    -- Order matters!
    SELECT *
    FROM users u
    LEFT JOIN orders o ON u.id = o.user_id
    INNER JOIN products p ON o.product_id = p.id;
    
    -- This is NOT the same as:
    SELECT *
    FROM users u
    INNER JOIN orders o ON u.id = o.user_id
    LEFT JOIN products p ON o.product_id = p.id;
    
    -- First: all users, orders (if any), products (if there's an order)
    -- Second: only users with orders, products (if any)
    
  2. Multiple LEFT JOINs
    -- A cascade of LEFT JOINs can give unexpected results
    SELECT u.name, o.id as order_id, p.id as product_id
    FROM users u
    LEFT JOIN orders o ON u.id = o.user_id
    LEFT JOIN products p ON o.product_id = p.id;
    
    -- If user has no orders → order_id = NULL, product_id = NULL
    -- If there's an order but no product → order_id = 123, product_id = NULL
    
  3. Coalescing with NULL
    -- Aggregations with LEFT JOIN
    SELECT u.name,
           COUNT(o.id) as orders_count,        -- 0 if no orders
           SUM(o.amount) as total_amount,       -- NULL if no orders!
           AVG(o.amount) as avg_amount          -- NULL if no orders!
    FROM users u
    LEFT JOIN orders o ON u.id = o.user_id
    GROUP BY u.name;
    
    -- Solution: COALESCE
    SELECT u.name,
           COUNT(o.id) as orders_count,
           COALESCE(SUM(o.amount), 0) as total_amount,
           COALESCE(AVG(o.amount), 0) as avg_amount
    FROM users u
    LEFT JOIN orders o ON u.id = o.user_id
    GROUP BY u.name;
    

Performance Comparison

Scenario INNER JOIN LEFT JOIN
Planning Faster (more options) Slower (restrictions)
Nested Loop Either table can be driving Left table is always driving
Hash Join Flexible building Right table is always built
Merge Join Both tables sorted Left table determines order
Index Usage Flexible Depends on direction

Production Experience

Real scenario #1: LEFT JOIN kills performance

  • Report: 15 tables, 12 of them via LEFT JOIN “just in case”
  • Problem: Query 45 seconds instead of expected 2
  • EXPLAIN: Nested Loop with left table always driving → catastrophe
  • Solution:
    • Replaced 8 LEFT JOINs with INNER JOINs (data is always there)
    • Split into 2 CTEs
  • Result: Query < 3 seconds

Real scenario #2: Outer Join Removal didn’t work

  • API: LEFT JOIN on a View with a UNIQUE constraint
  • Expected: JOIN removal by the optimizer
  • Reality: constraint was DEFERRABLE → optimizer not sure about UNIQUE
  • Solution: Replaced DEFERRABLE with NOT DEFERRABLE → removal worked
  • Result: Query sped up 5x

Monitoring

-- 1. Check plans with LEFT JOIN
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM users u
LEFT JOIN orders o ON u.id = o.user_id;

-- 2. Check for Outer Join Removal
EXPLAIN (VERBOSE)
SELECT u.name FROM users u
LEFT JOIN orders o ON u.id = o.user_id;
-- Look for "Join Type: Inner" or absence of orders in the plan

-- 3. Check Outer-to-Inner transformation
EXPLAIN (VERBOSE)
SELECT u.name, o.amount
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE o.amount > 100;
-- "Join Type: Inner" → transformation occurred

-- 4. JOIN statistics in pg_stat_statements
SELECT query, mean_exec_time, calls
FROM pg_stat_statements
WHERE query LIKE '%LEFT JOIN%'
ORDER BY mean_exec_time DESC
LIMIT 10;

Best Practices

  1. Use INNER JOIN where semantically possible
  2. LEFT JOIN only where needed to preserve all records from the left table
  3. Filters on the right table go in ON, not in WHERE
  4. Check the plan via EXPLAIN (VERBOSE) for JOIN removal
  5. Avoid LEFT JOIN cascades > 5 — the planner is restricted
  6. LATERAL JOIN only for small outer sets
  7. COALESCE for aggregations with LEFT JOIN
  8. DISTINCT or GROUP BY with LEFT JOIN to avoid duplication

Summary for Senior

  • INNER JOIN = maximum freedom for the optimizer
  • LEFT JOIN = restrictions on Join Reordering
  • Outer Join Removal (PG 10+) can remove unnecessary LEFT JOINs
  • Outer-to-Inner Transformation happens when filtering on NULL
  • LEFT JOIN LATERAL ≈ Nested Loop → only for small outer sets
  • Filters on the right table → in ON, not in WHERE
  • COALESCE for SUM/AVG with LEFT JOIN (NULL → 0)
  • Always check the plan via EXPLAIN (VERBOSE, ANALYZE)

🎯 Interview Cheat Sheet

Must know:

  • INNER JOIN: associative and commutative → optimizer can reorder tables
  • LEFT JOIN: NOT associative → order is strictly constrained
  • Outer Join Removal (PG 10+): removes LEFT JOIN if the right table is not used in SELECT
  • Outer-to-Inner Transformation: WHERE on the right table turns LEFT JOIN into INNER
  • Hash Join: for LEFT JOIN the hash is ALWAYS on the right table (for INNER — flexible)
  • Nested Loop: for LEFT JOIN the left table is ALWAYS driving
  • Filters on the right table → in ON (preserves NULLs), not in WHERE (filters NULLs)
  • COALESCE(SUM(…), 0) — SUM with LEFT JOIN returns NULL when no data

Common follow-up questions:

  • “Why is LEFT JOIN slower than INNER JOIN?” → Less freedom for Join Reordering
  • “When won’t Outer Join Removal work?” → DEFERRABLE constraint, VIEW
  • “What happens if a filter is in WHERE instead of ON for LEFT JOIN?” → Filters out NULL rows
  • “When is LATERAL JOIN a catastrophe?” → Outer table > 10,000 rows

Red flags (DO NOT say):

  • ❌ “LEFT JOIN is always safer” (it’s slower and restricted for the optimizer)
  • ❌ “A filter on the right table can go in WHERE” (turns LEFT JOIN into INNER!)
  • ❌ “SUM with LEFT JOIN returns 0” (returns NULL! Need COALESCE)

Related topics:

  • [[What JOIN types exist]] → physical execution strategies
  • [[What is better JOIN or subquery]] → when EXISTS is better than JOIN
  • [[What is a correlated subquery]] → EXISTS as an alternative