What is the Difference Between WHERE and HAVING?
The query is executed NOT in the order it's written:
🟢 Junior Level
WHERE and HAVING — both are used for filtering data, but at different stages:
- WHERE — filters rows before grouping
- HAVING — filters groups after grouping
Simple analogy: Imagine sorting coins:
- WHERE — select only coins of a certain denomination
- Group by year
- HAVING — keep only groups where there are more than 5 coins
Example:
-- WHERE: filter orders by status
-- HAVING: keep only users with > 5 orders
SELECT user_id, COUNT(*) as orders_count
FROM orders
WHERE status = 'COMPLETED' -- Filter BEFORE grouping
GROUP BY user_id
HAVING COUNT(*) > 5; -- Filter AFTER grouping
Rule:
- No
GROUP BY→ useWHERE - Has
GROUP BYand need to filter by aggregate (COUNT,SUM) →HAVING
🟡 Middle Level
Logical Query Processing Order
The query is executed NOT in the order it’s written:
SELECT user_id, COUNT(*) as cnt -- 5. SELECT
FROM orders -- 1. FROM
WHERE status = 'COMPLETED' -- 2. WHERE (indexes work!)
GROUP BY user_id -- 3. GROUP BY
HAVING COUNT(*) > 5 -- 4. HAVING
ORDER BY cnt DESC; -- 6. ORDER BY
Key point: WHERE executes BEFORE GROUP BY, and HAVING — AFTER.
Why This Matters for Performance
-- ❌ BAD: filtering in HAVING instead of WHERE
SELECT city, COUNT(*)
FROM users
GROUP BY city
HAVING city = 'Moscow'; -- Groups ALL cities, then filters!
-- ✅ GOOD: filtering in WHERE
SELECT city, COUNT(*)
FROM users
WHERE city = 'Moscow' -- Filters BEFORE grouping → less data
GROUP BY city;
Difference: The first query groups millions of rows, the second — only Moscow users.
Comparison
| Characteristic | WHERE | HAVING |
|---|---|---|
| What it filters | Rows | Groups |
| When | BEFORE GROUP BY | AFTER GROUP BY |
| Aggregates | ❌ Not allowed | ✅ Allowed |
| Indexes | ✅ Work | ❌ Don’t work |
| Performance | Fast | Slower (after aggregation) |
Common Mistakes
- Aggregate in WHERE
-- ❌ ERROR: aggregates cannot be used in WHERE SELECT user_id FROM orders WHERE COUNT(*) > 5; -- ✅ Correct: via HAVING SELECT user_id FROM orders GROUP BY user_id HAVING COUNT(*) > 5; - Filtering non-aggregate field in HAVING
-- ❌ Bad: city is not an aggregate SELECT city, COUNT(*) FROM users GROUP BY city HAVING city = 'Moscow'; -- ✅ Good: in WHERE SELECT city, COUNT(*) FROM users WHERE city = 'Moscow' GROUP BY city; - Alias from SELECT in HAVING
-- ⚠️ Works in PostgreSQL, but NOT in SQL Server/Oracle SELECT city, COUNT(*) as cnt FROM users GROUP BY city HAVING cnt > 10; -- cnt is an alias from SELECT -- ✅ Portable: SELECT city, COUNT(*) FROM users GROUP BY city HAVING COUNT(*) > 10;
HAVING without GROUP BY
-- Entire table = one group
SELECT 'Many orders'
FROM orders
HAVING COUNT(*) > 1000000;
-- Returns 1 row if orders > 1M, otherwise 0 rows
🔴 Senior Level
When NOT to Use
- HAVING for non-aggregate filters —
HAVING city = 'Moscow'groups ALL cities then filters. Use WHERE - WHERE with aggregates —
WHERE COUNT(*) > 5is not allowed. Use HAVING - HAVING for filtering by window functions — window functions are computed AFTER HAVING. Use a subquery/CTE
Filter Push-down Optimization
PostgreSQL 12+ tries to automatically move conditions from HAVING to WHERE:
Before PG 12 Filter Push-down was absent — conditions in HAVING always stayed in HAVING. This meant HAVING city = 'Moscow' grouped all cities, then filtered. On PG 12+ the optimizer will move city = 'Moscow' to WHERE automatically, but you shouldn’t rely on this.
-- Query
SELECT city, COUNT(*) FROM users
GROUP BY city
HAVING city = 'Moscow' AND COUNT(*) > 10;
-- Optimizer transforms into:
SELECT city, COUNT(*) FROM users
WHERE city = 'Moscow' -- Pushed down!
GROUP BY city
HAVING COUNT(*) > 10;
But don’t rely on this!
When Push-down does NOT work:
- Complex expressions with subqueries
- Inside Views
- When using CTEs with
MATERIALIZED
Golden rule: Always write non-aggregate filters in WHERE explicitly.
work_mem — a PostgreSQL parameter limiting memory for one operation (sort, hash aggregation). Default is 4MB. If the operation doesn’t fit, data spills to disk (spill to disk) into temporary files (temp_files), slowing execution 10-100x.
Impact on work_mem and temp_files
-- Problem: 100M rows without filter → GROUP BY
SELECT status, COUNT(*) FROM orders GROUP BY status;
-- work_mem = 4MB → Hash Aggregate doesn't fit → spills to disk → temp_files!
-- Solution: filter in WHERE
SELECT status, COUNT(*) FROM orders
WHERE created_at > '2024-01-01' -- Reduces input 10x
GROUP BY status;
-- Now fits in work_mem → 100x faster
-- Operations in memory (RAM) happen in nanoseconds, disk operations —
-- in milliseconds. A difference of 6 orders of magnitude. Even considering that spill
-- doesn't happen for every row, overall slowdown is 10-100x.
Monitoring spill:
EXPLAIN (ANALYZE, BUFFERS)
SELECT status, COUNT(*) FROM orders GROUP BY status;
-- Look for: "HashAggregate" + "Disk: X kB" → bad!
WHERE vs HAVING vs Window Functions
Filtering by window function is impossible in HAVING:
-- ❌ ERROR: window functions are computed AFTER HAVING
SELECT user_id, amount,
ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY created_at DESC) as rn
FROM orders
GROUP BY user_id, amount, created_at
HAVING rn = 1; -- Error: rn doesn't exist at HAVING stage
-- ✅ Solution: subquery or CTE
WITH ranked AS (
SELECT user_id, amount,
ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY created_at DESC) as rn
FROM orders
)
SELECT user_id, amount FROM ranked WHERE rn = 1;
Architectural Pattern: WHERE + HAVING
-- Proper separation of concerns
SELECT
department,
COUNT(*) as emp_count,
AVG(salary) as avg_salary
FROM employees
WHERE status = 'ACTIVE' -- 1. Filter out fired (index!)
AND hire_date > '2020-01-01' -- 2. Reduce input
GROUP BY department
HAVING COUNT(*) > 10 -- 3. Business logic: only large departments
AND AVG(salary) > 100000; -- 4. Filter by aggregate
Edge Cases
- NULL in HAVING
SELECT status, COUNT(*) FROM orders GROUP BY status HAVING COUNT(*) > 10; -- NULL is a separate group! -- If there are rows with status = NULL, they group together - HAVING with multiple aggregates
SELECT product_id FROM orders GROUP BY product_id HAVING COUNT(*) > 100 -- Minimum 100 orders AND AVG(amount) > 1000 -- Average check > 1000 AND MAX(amount) - MIN(amount) < 5000; -- Price spread < 5000 - FILTER in HAVING
SELECT project_id FROM tasks GROUP BY project_id HAVING COUNT(*) > 5 AND COUNT(*) FILTER (WHERE status = 'blocked') > 2; -- Projects with > 5 tasks, of which > 2 are blocked
Performance Impact
| Scenario | WHERE | HAVING |
|---|---|---|
| Indexes | ✅ Index Scan | ❌ None |
| Data volume | Reduces BEFORE GROUP BY | Filters AFTER GROUP BY |
| work_mem | Less → faster | More → slower |
| temp_files | Less risk | More risk |
Production Experience
Real scenario:
- Analytics: sales report (100M rows)
- Problem: query 2 minutes, spill to disk 5 GB
- EXPLAIN:
HashAggregate Disk: 5GB - Cause: ORM generated
HAVING status = 'ACTIVE'instead ofWHERE - Solution: moved
status = 'ACTIVE'to WHERE - Result: query 3 seconds (40x speedup), no spill
Monitoring
-- 1. Check for spill in plans
EXPLAIN (ANALYZE, BUFFERS)
SELECT city, COUNT(*) FROM users GROUP BY city;
-- Look for: "Disk: X kB" → increase work_mem or add WHERE
-- 2. Find queries with HAVING in pg_stat_statements
SELECT query, mean_exec_time, calls
FROM pg_stat_statements
WHERE query LIKE '%HAVING%'
ORDER BY mean_exec_time DESC
LIMIT 10;
-- 3. Check Filter Push-down
EXPLAIN (VERBOSE)
SELECT city, COUNT(*) FROM users GROUP BY city HAVING city = 'Moscow';
-- In the plan you'll see if the filter was pushed to WHERE
Best Practices
- WHERE for row filtering (before GROUP BY)
- HAVING only for aggregate filtering
- Never use HAVING for non-aggregate conditions
- Check the plan for spill (Disk usage)
- Filter as early as possible — save CPU and RAM
- FILTER inside HAVING for complex conditions
- Subquery/CTE for window function filtering
Summary for Senior
- WHERE filters input, HAVING filters output of grouping
- Filter Push-down (PG 12+) doesn’t always work — don’t rely on it
- work_mem pressure: HAVING doesn’t save from overflow during GROUP BY
- Window functions are computed AFTER HAVING → need subquery
- temp_files — indicator of a problem:
EXPLAIN (ANALYZE, BUFFERS) - Golden rule: Everything that can go → into
WHERE
🎯 Interview Cheat Sheet
Must know:
- WHERE filters rows BEFORE GROUP BY, HAVING — groups AFTER GROUP BY
- WHERE uses indexes, HAVING — doesn’t
- Aggregates (COUNT, SUM) cannot go in WHERE, can go in HAVING
- Filter Push-down (PG 12+): optimizer moves non-aggregate conditions from HAVING to WHERE
- FILTER inside HAVING:
COUNT(*) FILTER (WHERE status = 'ACTIVE') > 5 - work_mem pressure: HAVING doesn’t save — all groups are already created
- Window functions are computed AFTER HAVING → subquery/CTE
- Aliases from SELECT in HAVING: works in PG, does NOT work in SQL Server/Oracle
Frequent follow-up questions:
- “Why is HAVING slower than WHERE?” → Groups ALL data, then filters
- “When doesn’t Filter Push-down work?” → CTE MATERIALIZED, Views, subqueries
- “Can HAVING be used without GROUP BY?” → Yes, entire table = one group
- “What’s better: FILTER or CASE WHEN in aggregate?” → FILTER (clean SQL)
Red flags (DO NOT say):
- ❌ “HAVING optimizes GROUP BY” (no, it filters already grouped data)
- ❌ “Can rely on Filter Push-down” (doesn’t always work!)
- ❌ “Window functions can go in HAVING” (computed AFTER!)
Related topics:
- [[What does GROUP BY do]] → aggregation strategies
- [[When to use HAVING]] → usage scenarios
- [[What are Window Functions]] → why they can’t go in HAVING