Question 11 · Section 1

What is the Difference Between WHERE and HAVING?

The query is executed NOT in the order it's written:

Language versions: English Russian Ukrainian

🟢 Junior Level

WHERE and HAVING — both are used for filtering data, but at different stages:

  • WHERE — filters rows before grouping
  • HAVING — filters groups after grouping

Simple analogy: Imagine sorting coins:

  1. WHERE — select only coins of a certain denomination
  2. Group by year
  3. HAVING — keep only groups where there are more than 5 coins

Example:

-- WHERE: filter orders by status
-- HAVING: keep only users with > 5 orders
SELECT user_id, COUNT(*) as orders_count
FROM orders
WHERE status = 'COMPLETED'  -- Filter BEFORE grouping
GROUP BY user_id
HAVING COUNT(*) > 5;        -- Filter AFTER grouping

Rule:

  • No GROUP BY → use WHERE
  • Has GROUP BY and need to filter by aggregate (COUNT, SUM) → HAVING

🟡 Middle Level

Logical Query Processing Order

The query is executed NOT in the order it’s written:

SELECT user_id, COUNT(*) as cnt     -- 5. SELECT
FROM orders                         -- 1. FROM
WHERE status = 'COMPLETED'          -- 2. WHERE (indexes work!)
GROUP BY user_id                    -- 3. GROUP BY
HAVING COUNT(*) > 5                 -- 4. HAVING
ORDER BY cnt DESC;                  -- 6. ORDER BY

Key point: WHERE executes BEFORE GROUP BY, and HAVINGAFTER.

Why This Matters for Performance

-- ❌ BAD: filtering in HAVING instead of WHERE
SELECT city, COUNT(*)
FROM users
GROUP BY city
HAVING city = 'Moscow';  -- Groups ALL cities, then filters!

-- ✅ GOOD: filtering in WHERE
SELECT city, COUNT(*)
FROM users
WHERE city = 'Moscow'  -- Filters BEFORE grouping → less data
GROUP BY city;

Difference: The first query groups millions of rows, the second — only Moscow users.

Comparison

Characteristic WHERE HAVING
What it filters Rows Groups
When BEFORE GROUP BY AFTER GROUP BY
Aggregates ❌ Not allowed ✅ Allowed
Indexes ✅ Work ❌ Don’t work
Performance Fast Slower (after aggregation)

Common Mistakes

  1. Aggregate in WHERE
    -- ❌ ERROR: aggregates cannot be used in WHERE
    SELECT user_id FROM orders WHERE COUNT(*) > 5;
    
    -- ✅ Correct: via HAVING
    SELECT user_id FROM orders GROUP BY user_id HAVING COUNT(*) > 5;
    
  2. Filtering non-aggregate field in HAVING
    -- ❌ Bad: city is not an aggregate
    SELECT city, COUNT(*) FROM users GROUP BY city HAVING city = 'Moscow';
    
    -- ✅ Good: in WHERE
    SELECT city, COUNT(*) FROM users WHERE city = 'Moscow' GROUP BY city;
    
  3. Alias from SELECT in HAVING
    -- ⚠️ Works in PostgreSQL, but NOT in SQL Server/Oracle
    SELECT city, COUNT(*) as cnt FROM users
    GROUP BY city
    HAVING cnt > 10;  -- cnt is an alias from SELECT
    
    -- ✅ Portable:
    SELECT city, COUNT(*) FROM users
    GROUP BY city
    HAVING COUNT(*) > 10;
    

HAVING without GROUP BY

-- Entire table = one group
SELECT 'Many orders'
FROM orders
HAVING COUNT(*) > 1000000;

-- Returns 1 row if orders > 1M, otherwise 0 rows

🔴 Senior Level

When NOT to Use

  1. HAVING for non-aggregate filtersHAVING city = 'Moscow' groups ALL cities then filters. Use WHERE
  2. WHERE with aggregatesWHERE COUNT(*) > 5 is not allowed. Use HAVING
  3. HAVING for filtering by window functions — window functions are computed AFTER HAVING. Use a subquery/CTE

Filter Push-down Optimization

PostgreSQL 12+ tries to automatically move conditions from HAVING to WHERE:

Before PG 12 Filter Push-down was absent — conditions in HAVING always stayed in HAVING. This meant HAVING city = 'Moscow' grouped all cities, then filtered. On PG 12+ the optimizer will move city = 'Moscow' to WHERE automatically, but you shouldn’t rely on this.

-- Query
SELECT city, COUNT(*) FROM users
GROUP BY city
HAVING city = 'Moscow' AND COUNT(*) > 10;

-- Optimizer transforms into:
SELECT city, COUNT(*) FROM users
WHERE city = 'Moscow'          -- Pushed down!
GROUP BY city
HAVING COUNT(*) > 10;

But don’t rely on this!

When Push-down does NOT work:

  • Complex expressions with subqueries
  • Inside Views
  • When using CTEs with MATERIALIZED

Golden rule: Always write non-aggregate filters in WHERE explicitly.

work_mem — a PostgreSQL parameter limiting memory for one operation (sort, hash aggregation). Default is 4MB. If the operation doesn’t fit, data spills to disk (spill to disk) into temporary files (temp_files), slowing execution 10-100x.

Impact on work_mem and temp_files

-- Problem: 100M rows without filter → GROUP BY
SELECT status, COUNT(*) FROM orders GROUP BY status;
-- work_mem = 4MB → Hash Aggregate doesn't fit → spills to disk → temp_files!

-- Solution: filter in WHERE
SELECT status, COUNT(*) FROM orders
WHERE created_at > '2024-01-01'  -- Reduces input 10x
GROUP BY status;
-- Now fits in work_mem → 100x faster
-- Operations in memory (RAM) happen in nanoseconds, disk operations —
-- in milliseconds. A difference of 6 orders of magnitude. Even considering that spill
-- doesn't happen for every row, overall slowdown is 10-100x.

Monitoring spill:

EXPLAIN (ANALYZE, BUFFERS)
SELECT status, COUNT(*) FROM orders GROUP BY status;
-- Look for: "HashAggregate" + "Disk: X kB" → bad!

WHERE vs HAVING vs Window Functions

Filtering by window function is impossible in HAVING:

-- ❌ ERROR: window functions are computed AFTER HAVING
SELECT user_id, amount,
       ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY created_at DESC) as rn
FROM orders
GROUP BY user_id, amount, created_at
HAVING rn = 1;  -- Error: rn doesn't exist at HAVING stage

-- ✅ Solution: subquery or CTE
WITH ranked AS (
    SELECT user_id, amount,
           ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY created_at DESC) as rn
    FROM orders
)
SELECT user_id, amount FROM ranked WHERE rn = 1;

Architectural Pattern: WHERE + HAVING

-- Proper separation of concerns
SELECT
    department,
    COUNT(*) as emp_count,
    AVG(salary) as avg_salary
FROM employees
WHERE status = 'ACTIVE'              -- 1. Filter out fired (index!)
  AND hire_date > '2020-01-01'       -- 2. Reduce input
GROUP BY department
HAVING COUNT(*) > 10                 -- 3. Business logic: only large departments
   AND AVG(salary) > 100000;         -- 4. Filter by aggregate

Edge Cases

  1. NULL in HAVING
    SELECT status, COUNT(*)
    FROM orders
    GROUP BY status
    HAVING COUNT(*) > 10;
    
    -- NULL is a separate group!
    -- If there are rows with status = NULL, they group together
    
  2. HAVING with multiple aggregates
    SELECT product_id
    FROM orders
    GROUP BY product_id
    HAVING COUNT(*) > 100               -- Minimum 100 orders
       AND AVG(amount) > 1000            -- Average check > 1000
       AND MAX(amount) - MIN(amount) < 5000;  -- Price spread < 5000
    
  3. FILTER in HAVING
    SELECT project_id
    FROM tasks
    GROUP BY project_id
    HAVING COUNT(*) > 5
       AND COUNT(*) FILTER (WHERE status = 'blocked') > 2;
    -- Projects with > 5 tasks, of which > 2 are blocked
    

Performance Impact

Scenario WHERE HAVING
Indexes ✅ Index Scan ❌ None
Data volume Reduces BEFORE GROUP BY Filters AFTER GROUP BY
work_mem Less → faster More → slower
temp_files Less risk More risk

Production Experience

Real scenario:

  • Analytics: sales report (100M rows)
  • Problem: query 2 minutes, spill to disk 5 GB
  • EXPLAIN: HashAggregate Disk: 5GB
  • Cause: ORM generated HAVING status = 'ACTIVE' instead of WHERE
  • Solution: moved status = 'ACTIVE' to WHERE
  • Result: query 3 seconds (40x speedup), no spill

Monitoring

-- 1. Check for spill in plans
EXPLAIN (ANALYZE, BUFFERS)
SELECT city, COUNT(*) FROM users GROUP BY city;
-- Look for: "Disk: X kB" → increase work_mem or add WHERE

-- 2. Find queries with HAVING in pg_stat_statements
SELECT query, mean_exec_time, calls
FROM pg_stat_statements
WHERE query LIKE '%HAVING%'
ORDER BY mean_exec_time DESC
LIMIT 10;

-- 3. Check Filter Push-down
EXPLAIN (VERBOSE)
SELECT city, COUNT(*) FROM users GROUP BY city HAVING city = 'Moscow';
-- In the plan you'll see if the filter was pushed to WHERE

Best Practices

  1. WHERE for row filtering (before GROUP BY)
  2. HAVING only for aggregate filtering
  3. Never use HAVING for non-aggregate conditions
  4. Check the plan for spill (Disk usage)
  5. Filter as early as possible — save CPU and RAM
  6. FILTER inside HAVING for complex conditions
  7. Subquery/CTE for window function filtering

Summary for Senior

  • WHERE filters input, HAVING filters output of grouping
  • Filter Push-down (PG 12+) doesn’t always work — don’t rely on it
  • work_mem pressure: HAVING doesn’t save from overflow during GROUP BY
  • Window functions are computed AFTER HAVING → need subquery
  • temp_files — indicator of a problem: EXPLAIN (ANALYZE, BUFFERS)
  • Golden rule: Everything that can go → into WHERE

🎯 Interview Cheat Sheet

Must know:

  • WHERE filters rows BEFORE GROUP BY, HAVING — groups AFTER GROUP BY
  • WHERE uses indexes, HAVING — doesn’t
  • Aggregates (COUNT, SUM) cannot go in WHERE, can go in HAVING
  • Filter Push-down (PG 12+): optimizer moves non-aggregate conditions from HAVING to WHERE
  • FILTER inside HAVING: COUNT(*) FILTER (WHERE status = 'ACTIVE') > 5
  • work_mem pressure: HAVING doesn’t save — all groups are already created
  • Window functions are computed AFTER HAVING → subquery/CTE
  • Aliases from SELECT in HAVING: works in PG, does NOT work in SQL Server/Oracle

Frequent follow-up questions:

  • “Why is HAVING slower than WHERE?” → Groups ALL data, then filters
  • “When doesn’t Filter Push-down work?” → CTE MATERIALIZED, Views, subqueries
  • “Can HAVING be used without GROUP BY?” → Yes, entire table = one group
  • “What’s better: FILTER or CASE WHEN in aggregate?” → FILTER (clean SQL)

Red flags (DO NOT say):

  • ❌ “HAVING optimizes GROUP BY” (no, it filters already grouped data)
  • ❌ “Can rely on Filter Push-down” (doesn’t always work!)
  • ❌ “Window functions can go in HAVING” (computed AFTER!)

Related topics:

  • [[What does GROUP BY do]] → aggregation strategies
  • [[When to use HAVING]] → usage scenarios
  • [[What are Window Functions]] → why they can’t go in HAVING