What is the Difference Between WHERE and HAVING?

🟢 Junior Level

WHERE and HAVING — both are used for filtering data, but at different stages:

WHERE — filters rows before grouping
HAVING — filters groups after grouping

Simple analogy: Imagine sorting coins:

WHERE — select only coins of a certain denomination
Group by year
HAVING — keep only groups where there are more than 5 coins

Example:

-- WHERE: filter orders by status
-- HAVING: keep only users with > 5 orders
SELECT user_id, COUNT(*) as orders_count
FROM orders
WHERE status = 'COMPLETED'  -- Filter BEFORE grouping
GROUP BY user_id
HAVING COUNT(*) > 5;        -- Filter AFTER grouping

Rule:

No GROUP BY → use WHERE
Has GROUP BY and need to filter by aggregate (COUNT, SUM) → HAVING

🟡 Middle Level

Logical Query Processing Order

The query is executed NOT in the order it’s written:

SELECT user_id, COUNT(*) as cnt     -- 5. SELECT
FROM orders                         -- 1. FROM
WHERE status = 'COMPLETED'          -- 2. WHERE (indexes work!)
GROUP BY user_id                    -- 3. GROUP BY
HAVING COUNT(*) > 5                 -- 4. HAVING
ORDER BY cnt DESC;                  -- 6. ORDER BY

Key point: WHERE executes BEFORE GROUP BY, and HAVING — AFTER.

Why This Matters for Performance

-- ❌ BAD: filtering in HAVING instead of WHERE
SELECT city, COUNT(*)
FROM users
GROUP BY city
HAVING city = 'Moscow';  -- Groups ALL cities, then filters!

-- ✅ GOOD: filtering in WHERE
SELECT city, COUNT(*)
FROM users
WHERE city = 'Moscow'  -- Filters BEFORE grouping → less data
GROUP BY city;

Difference: The first query groups millions of rows, the second — only Moscow users.

Comparison

Characteristic	WHERE	HAVING
What it filters	Rows	Groups
When	BEFORE GROUP BY	AFTER GROUP BY
Aggregates	❌ Not allowed	✅ Allowed
Indexes	✅ Work	❌ Don’t work
Performance	Fast	Slower (after aggregation)

Common Mistakes

Aggregate in WHERE

-- ❌ ERROR: aggregates cannot be used in WHERE
SELECT user_id FROM orders WHERE COUNT(*) > 5;

-- ✅ Correct: via HAVING
SELECT user_id FROM orders GROUP BY user_id HAVING COUNT(*) > 5;

Filtering non-aggregate field in HAVING

-- ❌ Bad: city is not an aggregate
SELECT city, COUNT(*) FROM users GROUP BY city HAVING city = 'Moscow';

-- ✅ Good: in WHERE
SELECT city, COUNT(*) FROM users WHERE city = 'Moscow' GROUP BY city;

Alias from SELECT in HAVING

-- ⚠️ Works in PostgreSQL, but NOT in SQL Server/Oracle
SELECT city, COUNT(*) as cnt FROM users
GROUP BY city
HAVING cnt > 10;  -- cnt is an alias from SELECT

-- ✅ Portable:
SELECT city, COUNT(*) FROM users
GROUP BY city
HAVING COUNT(*) > 10;

HAVING without GROUP BY

-- Entire table = one group
SELECT 'Many orders'
FROM orders
HAVING COUNT(*) > 1000000;

-- Returns 1 row if orders > 1M, otherwise 0 rows

🔴 Senior Level

When NOT to Use

HAVING for non-aggregate filters — HAVING city = 'Moscow' groups ALL cities then filters. Use WHERE
WHERE with aggregates — WHERE COUNT(*) > 5 is not allowed. Use HAVING
HAVING for filtering by window functions — window functions are computed AFTER HAVING. Use a subquery/CTE

Filter Push-down Optimization

PostgreSQL 12+ tries to automatically move conditions from HAVING to WHERE:

Before PG 12 Filter Push-down was absent — conditions in HAVING always stayed in HAVING. This meant HAVING city = 'Moscow' grouped all cities, then filtered. On PG 12+ the optimizer will move city = 'Moscow' to WHERE automatically, but you shouldn’t rely on this.

-- Query
SELECT city, COUNT(*) FROM users
GROUP BY city
HAVING city = 'Moscow' AND COUNT(*) > 10;

-- Optimizer transforms into:
SELECT city, COUNT(*) FROM users
WHERE city = 'Moscow'          -- Pushed down!
GROUP BY city
HAVING COUNT(*) > 10;

But don’t rely on this!

When Push-down does NOT work:

Complex expressions with subqueries
Inside Views
When using CTEs with MATERIALIZED

Golden rule: Always write non-aggregate filters in WHERE explicitly.

work_mem — a PostgreSQL parameter limiting memory for one operation (sort, hash aggregation). Default is 4MB. If the operation doesn’t fit, data spills to disk (spill to disk) into temporary files (temp_files), slowing execution 10-100x.

Impact on work_mem and temp_files

-- Problem: 100M rows without filter → GROUP BY
SELECT status, COUNT(*) FROM orders GROUP BY status;
-- work_mem = 4MB → Hash Aggregate doesn't fit → spills to disk → temp_files!

-- Solution: filter in WHERE
SELECT status, COUNT(*) FROM orders
WHERE created_at > '2024-01-01'  -- Reduces input 10x
GROUP BY status;
-- Now fits in work_mem → 100x faster
-- Operations in memory (RAM) happen in nanoseconds, disk operations —
-- in milliseconds. A difference of 6 orders of magnitude. Even considering that spill
-- doesn't happen for every row, overall slowdown is 10-100x.

Monitoring spill:

EXPLAIN (ANALYZE, BUFFERS)
SELECT status, COUNT(*) FROM orders GROUP BY status;
-- Look for: "HashAggregate" + "Disk: X kB" → bad!

WHERE vs HAVING vs Window Functions

Filtering by window function is impossible in HAVING:

-- ❌ ERROR: window functions are computed AFTER HAVING
SELECT user_id, amount,
       ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY created_at DESC) as rn
FROM orders
GROUP BY user_id, amount, created_at
HAVING rn = 1;  -- Error: rn doesn't exist at HAVING stage

-- ✅ Solution: subquery or CTE
WITH ranked AS (
    SELECT user_id, amount,
           ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY created_at DESC) as rn
    FROM orders
)
SELECT user_id, amount FROM ranked WHERE rn = 1;

Architectural Pattern: WHERE + HAVING

-- Proper separation of concerns
SELECT
    department,
    COUNT(*) as emp_count,
    AVG(salary) as avg_salary
FROM employees
WHERE status = 'ACTIVE'              -- 1. Filter out fired (index!)
  AND hire_date > '2020-01-01'       -- 2. Reduce input
GROUP BY department
HAVING COUNT(*) > 10                 -- 3. Business logic: only large departments
   AND AVG(salary) > 100000;         -- 4. Filter by aggregate

Edge Cases

NULL in HAVING

SELECT status, COUNT(*)
FROM orders
GROUP BY status
HAVING COUNT(*) > 10;

-- NULL is a separate group!
-- If there are rows with status = NULL, they group together

HAVING with multiple aggregates

SELECT product_id
FROM orders
GROUP BY product_id
HAVING COUNT(*) > 100               -- Minimum 100 orders
   AND AVG(amount) > 1000            -- Average check > 1000
   AND MAX(amount) - MIN(amount) < 5000;  -- Price spread < 5000

FILTER in HAVING

SELECT project_id
FROM tasks
GROUP BY project_id
HAVING COUNT(*) > 5
   AND COUNT(*) FILTER (WHERE status = 'blocked') > 2;
-- Projects with > 5 tasks, of which > 2 are blocked

Performance Impact

Scenario	WHERE	HAVING
Indexes	✅ Index Scan	❌ None
Data volume	Reduces BEFORE GROUP BY	Filters AFTER GROUP BY
work_mem	Less → faster	More → slower
temp_files	Less risk	More risk

Production Experience

Real scenario:

Analytics: sales report (100M rows)
Problem: query 2 minutes, spill to disk 5 GB
EXPLAIN: HashAggregate Disk: 5GB
Cause: ORM generated HAVING status = 'ACTIVE' instead of WHERE
Solution: moved status = 'ACTIVE' to WHERE
Result: query 3 seconds (40x speedup), no spill

Monitoring

-- 1. Check for spill in plans
EXPLAIN (ANALYZE, BUFFERS)
SELECT city, COUNT(*) FROM users GROUP BY city;
-- Look for: "Disk: X kB" → increase work_mem or add WHERE

-- 2. Find queries with HAVING in pg_stat_statements
SELECT query, mean_exec_time, calls
FROM pg_stat_statements
WHERE query LIKE '%HAVING%'
ORDER BY mean_exec_time DESC
LIMIT 10;

-- 3. Check Filter Push-down
EXPLAIN (VERBOSE)
SELECT city, COUNT(*) FROM users GROUP BY city HAVING city = 'Moscow';
-- In the plan you'll see if the filter was pushed to WHERE

Best Practices

WHERE for row filtering (before GROUP BY)
HAVING only for aggregate filtering
Never use HAVING for non-aggregate conditions
Check the plan for spill (Disk usage)
Filter as early as possible — save CPU and RAM
FILTER inside HAVING for complex conditions
Subquery/CTE for window function filtering

Summary for Senior

WHERE filters input, HAVING filters output of grouping
Filter Push-down (PG 12+) doesn’t always work — don’t rely on it
work_mem pressure: HAVING doesn’t save from overflow during GROUP BY
Window functions are computed AFTER HAVING → need subquery
temp_files — indicator of a problem: EXPLAIN (ANALYZE, BUFFERS)
Golden rule: Everything that can go → into WHERE

🎯 Interview Cheat Sheet

Must know:

WHERE filters rows BEFORE GROUP BY, HAVING — groups AFTER GROUP BY
WHERE uses indexes, HAVING — doesn’t
Aggregates (COUNT, SUM) cannot go in WHERE, can go in HAVING
Filter Push-down (PG 12+): optimizer moves non-aggregate conditions from HAVING to WHERE
FILTER inside HAVING: COUNT(*) FILTER (WHERE status = 'ACTIVE') > 5
work_mem pressure: HAVING doesn’t save — all groups are already created
Window functions are computed AFTER HAVING → subquery/CTE
Aliases from SELECT in HAVING: works in PG, does NOT work in SQL Server/Oracle

Frequent follow-up questions:

“Why is HAVING slower than WHERE?” → Groups ALL data, then filters
“When doesn’t Filter Push-down work?” → CTE MATERIALIZED, Views, subqueries
“Can HAVING be used without GROUP BY?” → Yes, entire table = one group
“What’s better: FILTER or CASE WHEN in aggregate?” → FILTER (clean SQL)

Red flags (DO NOT say):

❌ “HAVING optimizes GROUP BY” (no, it filters already grouped data)
❌ “Can rely on Filter Push-down” (doesn’t always work!)
❌ “Window functions can go in HAVING” (computed AFTER!)

Related topics:

[[What does GROUP BY do]] → aggregation strategies
[[When to use HAVING]] → usage scenarios
[[What are Window Functions]] → why they can’t go in HAVING