When to Use HAVING?

🟢 Junior Level

HAVING is used when you need to filter results after grouping by aggregate functions (COUNT, SUM, AVG, etc.).

Simple rule:

Filter by regular fields → WHERE
Filter by aggregation results → HAVING

Example:

-- Find users with more than 5 orders
SELECT user_id, COUNT(*) as orders_count
FROM orders
GROUP BY user_id
HAVING COUNT(*) > 5;  -- Filter AFTER grouping

-- Find departments with average salary > 100,000
SELECT department, AVG(salary) as avg_sal
FROM employees
GROUP BY department
HAVING AVG(salary) > 100000;

Common scenarios:

Finding duplicates (HAVING COUNT(*) > 1)
Finding groups exceeding a threshold
Filtering by average/total values

🟡 Middle Level

Architectural Pattern: WHERE + HAVING

Proper separation of filters is a sign of good SQL:

SELECT customer_id, AVG(amount) as avg_amount
FROM orders
WHERE status = 'delivered'           -- 1. Filter BEFORE grouping (index!)
  AND created_at > '2024-01-01'      -- 2. Reduce input data
GROUP BY customer_id
HAVING COUNT(*) > 10                 -- 3. Filter AFTER grouping
   AND AVG(amount) > 5000;           -- 4. Business logic

Why this matters:

WHERE reduces the number of rows BEFORE GROUP BY → less work
HAVING filters already grouped data

Main Scenarios

1. Aggregated thresholds:

-- Sellers with revenue > 1M
SELECT seller_id, SUM(price) as revenue
FROM sales
GROUP BY seller_id
HAVING SUM(price) > 1000000;

2. Finding duplicates:

-- Duplicate emails
SELECT email, COUNT(*) as cnt
FROM users
GROUP BY email
HAVING COUNT(*) > 1;

3. Checking group properties:

-- Projects with issues (more than 2 blocked tasks)
SELECT project_id
FROM tasks
GROUP BY project_id
HAVING COUNT(*) > 5
   AND COUNT(*) FILTER (WHERE status = 'blocked') > 2;

4. HAVING without GROUP BY:

-- Check condition on the entire table
SELECT 'Warning: High avg salary'
FROM employees
HAVING AVG(salary) > 200000;
-- Returns 1 row if condition is true, otherwise 0 rows

Common Mistakes

Filtering non-aggregate field in HAVING

-- ❌ BAD: city is not an aggregate
SELECT city, COUNT(*) FROM users
GROUP BY city
HAVING city = 'Moscow';

-- ✅ GOOD: in WHERE
SELECT city, COUNT(*) FROM users
WHERE city = 'Moscow'
GROUP BY city;

Redundancy in ORM

-- ORM often generates:
SELECT city, COUNT(*) FROM users GROUP BY city HAVING city = 'Moscow';
-- This is an anti-pattern! City should be in WHERE
-- Allows using an index and reducing data for aggregation

WHERE vs HAVING Comparison

Scenario	WHERE	HAVING
`status = 'ACTIVE'`	✅	❌ (if no GROUP BY)
`COUNT(*) > 10`	❌ Cannot	✅
`SUM(amount) > 1000`	❌ Cannot	✅
`city = 'Moscow'`	✅	⚠️ Possible, but bad

🔴 Senior Level

When NOT to Use HAVING

For non-aggregate filters — HAVING city = 'Moscow' groups ALL cities, then filters. Move to WHERE
For filtering by window functions — window functions are computed AFTER HAVING. Use a subquery/CTE
When there are too many groups — millions of groups → spill to disk. Pre-filter in WHERE or use Materialized Views

The Huge Number of Groups Problem

If after WHERE there are 100M rows and 10M groups:

-- Problem: database must allocate memory for ALL 10M groups
SELECT user_id, SUM(amount)
FROM orders
WHERE created_at > '2023-01-01'  -- 100M rows
GROUP BY user_id                  -- 10M groups
HAVING SUM(amount) > 10000;       -- Filter after

What happens:

HashAggregate tries to place 10M groups in work_mem (work_mem — memory for one operation, default 4MB)
If it doesn’t fit → spill to disk (temp_files — temporary files on disk)
Performance drops 10-100x (disk operations are millions of times slower than RAM)

Solutions:

-- 1. Pre-filtering via CTE
WITH filtered_orders AS (
    SELECT user_id, amount
    FROM orders
    WHERE created_at > '2023-01-01'
      AND amount > 100  -- Additional filter
)
SELECT user_id, SUM(amount)
FROM filtered_orders
GROUP BY user_id
HAVING SUM(amount) > 10000;

-- 2. Materialized view for frequent queries
CREATE MATERIALIZED VIEW mv_user_stats AS
SELECT user_id, SUM(amount) as total, COUNT(*) as cnt
FROM orders
GROUP BY user_id;

-- Then a fast query
SELECT user_id FROM mv_user_stats WHERE total > 10000;

FILTER in HAVING for Declarativeness

-- Elegant checking of complex conditions
SELECT project_id
FROM tasks
GROUP BY project_id
HAVING
    COUNT(*) > 10                                      -- Minimum 10 tasks
    AND COUNT(*) FILTER (WHERE status = 'done') >= 8   -- 80% completed
    AND COUNT(*) FILTER (WHERE status = 'blocked') = 0 -- No blockers
    AND MAX(due_date) > CURRENT_DATE;                  -- Deadline hasn't passed

Redundancy and the Optimizer

PostgreSQL tries to do Filter Push-down — move conditions from HAVING to WHERE:

-- Query
SELECT city, COUNT(*) FROM users GROUP BY city HAVING city = 'Moscow';

-- Optimizer transforms into:
SELECT city, COUNT(*) FROM users WHERE city = 'Moscow' GROUP BY city;

BUT: This doesn’t always work!

Complex expressions with subqueries
Inside Views
CTEs with MATERIALIZED

Golden rule: Always write non-aggregate filters in WHERE explicitly.

SQL Standards and Portability

-- ⚠️ Works in PostgreSQL, does NOT work in SQL Server/Oracle
SELECT city, COUNT(*) as cnt FROM users
GROUP BY city
HAVING cnt > 10;  -- cnt is an alias from SELECT

-- ✅ Portable across all DBMS
SELECT city, COUNT(*) FROM users
GROUP BY city
HAVING COUNT(*) > 10;

Edge Cases

HAVING with a subquery

-- Users whose order total is above average
SELECT user_id, SUM(amount) as total
FROM orders
GROUP BY user_id
HAVING SUM(amount) > (SELECT AVG(total) FROM (
    SELECT SUM(amount) as total FROM orders GROUP BY user_id
) sub);

HAVING with window functions — NOT ALLOWED

-- ❌ ERROR: window functions are computed AFTER HAVING
SELECT user_id, amount,
       ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY created_at DESC) as rn
FROM orders
GROUP BY user_id, amount, created_at
HAVING rn = 1;  -- Error!

-- ✅ Solution: CTE
WITH ranked AS (
    SELECT user_id, amount,
           ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY created_at DESC) as rn
    FROM orders
)
SELECT user_id, amount FROM ranked WHERE rn = 1;

HAVING with multiple GROUP BY

-- ROLLUP + HAVING
SELECT department, city, COUNT(*)
FROM employees
GROUP BY ROLLUP(department, city)
HAVING COUNT(*) > 5;
-- Filters ALL levels (detailed and total rows)

Performance Impact

-- Monitoring work_mem pressure
EXPLAIN (ANALYZE, BUFFERS)
SELECT status, COUNT(*) FROM orders GROUP BY status HAVING COUNT(*) > 100;

-- Look for:
-- HashAggregate (memory usage: X kB)  → if X is close to work_mem → spill risk
-- HashAggregate (disk: Y kB)          → spill happened! Increase work_mem

Production Experience

Real scenario:

E-commerce: sales report (500M rows)
ORM generated HAVING status = 'COMPLETED' instead of WHERE
HashAggregate: 500M rows → spill 30 GB to disk
Time: 15 minutes
Solution: WHERE status = 'COMPLETED' before GROUP BY
Result: 10 seconds (90x speedup)
Why: ORM generated HAVING status = 'COMPLETED' — the database first grouped all 500M rows (including cancelled, returned, etc.), creating millions of groups, and only then filtered. Moving to WHERE reduced input data for GROUP BY 10x + allowed using an index on status.

Monitoring

-- 1. Find queries with HAVING in pg_stat_statements
SELECT query, mean_exec_time, calls, shared_blks_read
FROM pg_stat_statements
WHERE query LIKE '%HAVING%'
ORDER BY mean_exec_time DESC
LIMIT 10;

-- 2. Check for spill
EXPLAIN (ANALYZE, BUFFERS)
SELECT ... GROUP BY ... HAVING ...;
-- "Disk: X kB" → problem!

-- 3. Check Filter Push-down
EXPLAIN (VERBOSE)
SELECT city, COUNT(*) FROM users GROUP BY city HAVING city = 'Moscow';
-- In the plan you'll see if the filter was pushed

Best Practices

WHERE for non-aggregate filters, HAVING for aggregate filters
FILTER (WHERE …) inside HAVING for complex conditions
Avoid aliases from SELECT in HAVING (not portable)
Monitor work_mem for large GROUP BY
CTE/Materialized Views for pre-filtering
Check the plan for spill (Disk usage)
Remember: HAVING doesn’t save from memory overflow during GROUP BY

Summary for Senior

HAVING is the last line of filtering after GROUP BY
Doesn’t save resources during aggregation — all groups are already created
Filter Push-down (PG 12+) doesn’t always work — write WHERE explicitly
work_mem pressure: HAVING filters OUTPUT, not INPUT
Window functions are not available in HAVING → CTE/subquery
FILTER for declarativeness of complex conditions
Portability: don’t use aliases from SELECT in HAVING

🎯 Interview Cheat Sheet

Must know:

HAVING filters the RESULT of GROUP BY, not input data
WHERE reduces data BEFORE GROUP BY → less work_mem, faster
Filter Push-down (PG 12+): automatic move from HAVING to WHERE, but NOT always
Huge number of groups: 10M groups → spill to disk → CTE/Materialized View
FILTER in HAVING: COUNT(*) FILTER (WHERE status = 'blocked') > 2
Window functions are computed AFTER HAVING → subquery/CTE
Aliases from SELECT in HAVING: NOT portable (SQL Server/Oracle don’t support)
HAVING with subquery: possible, but slow

Frequent follow-up questions:

“Why does ORM generate HAVING instead of WHERE?” → ORM anti-pattern, doesn’t optimize
“How to avoid spill with 10M groups?” → Filter in WHERE, work_mem ↑, Materialized View
“Can FILTER be used in HAVING?” → Yes, elegant for complex conditions
“When is HAVING with subquery justified?” → Comparing group to overall average

Red flags (DO NOT say):

❌ “HAVING reduces data for GROUP BY” (no, it filters OUTPUT)
❌ “Can rely on Filter Push-down” (doesn’t always work!)
❌ “Window functions can go in HAVING” (computed AFTER!)

Related topics:

[[What is the difference between WHERE and HAVING]] → execution order
[[What does GROUP BY do]] → huge number of groups
[[What are Window Functions]] → why they can’t go in HAVING