When to Use HAVING?
Proper separation of filters is a sign of good SQL:
🟢 Junior Level
HAVING is used when you need to filter results after grouping by aggregate functions (COUNT, SUM, AVG, etc.).
Simple rule:
- Filter by regular fields →
WHERE - Filter by aggregation results →
HAVING
Example:
-- Find users with more than 5 orders
SELECT user_id, COUNT(*) as orders_count
FROM orders
GROUP BY user_id
HAVING COUNT(*) > 5; -- Filter AFTER grouping
-- Find departments with average salary > 100,000
SELECT department, AVG(salary) as avg_sal
FROM employees
GROUP BY department
HAVING AVG(salary) > 100000;
Common scenarios:
- Finding duplicates (
HAVING COUNT(*) > 1) - Finding groups exceeding a threshold
- Filtering by average/total values
🟡 Middle Level
Architectural Pattern: WHERE + HAVING
Proper separation of filters is a sign of good SQL:
SELECT customer_id, AVG(amount) as avg_amount
FROM orders
WHERE status = 'delivered' -- 1. Filter BEFORE grouping (index!)
AND created_at > '2024-01-01' -- 2. Reduce input data
GROUP BY customer_id
HAVING COUNT(*) > 10 -- 3. Filter AFTER grouping
AND AVG(amount) > 5000; -- 4. Business logic
Why this matters:
WHEREreduces the number of rows BEFORE GROUP BY → less workHAVINGfilters already grouped data
Main Scenarios
1. Aggregated thresholds:
-- Sellers with revenue > 1M
SELECT seller_id, SUM(price) as revenue
FROM sales
GROUP BY seller_id
HAVING SUM(price) > 1000000;
2. Finding duplicates:
-- Duplicate emails
SELECT email, COUNT(*) as cnt
FROM users
GROUP BY email
HAVING COUNT(*) > 1;
3. Checking group properties:
-- Projects with issues (more than 2 blocked tasks)
SELECT project_id
FROM tasks
GROUP BY project_id
HAVING COUNT(*) > 5
AND COUNT(*) FILTER (WHERE status = 'blocked') > 2;
4. HAVING without GROUP BY:
-- Check condition on the entire table
SELECT 'Warning: High avg salary'
FROM employees
HAVING AVG(salary) > 200000;
-- Returns 1 row if condition is true, otherwise 0 rows
Common Mistakes
- Filtering non-aggregate field in HAVING
-- ❌ BAD: city is not an aggregate SELECT city, COUNT(*) FROM users GROUP BY city HAVING city = 'Moscow'; -- ✅ GOOD: in WHERE SELECT city, COUNT(*) FROM users WHERE city = 'Moscow' GROUP BY city; - Redundancy in ORM
-- ORM often generates: SELECT city, COUNT(*) FROM users GROUP BY city HAVING city = 'Moscow'; -- This is an anti-pattern! City should be in WHERE -- Allows using an index and reducing data for aggregation
WHERE vs HAVING Comparison
| Scenario | WHERE | HAVING |
|---|---|---|
status = 'ACTIVE' |
✅ | ❌ (if no GROUP BY) |
COUNT(*) > 10 |
❌ Cannot | ✅ |
SUM(amount) > 1000 |
❌ Cannot | ✅ |
city = 'Moscow' |
✅ | ⚠️ Possible, but bad |
🔴 Senior Level
When NOT to Use HAVING
- For non-aggregate filters —
HAVING city = 'Moscow'groups ALL cities, then filters. Move to WHERE - For filtering by window functions — window functions are computed AFTER HAVING. Use a subquery/CTE
- When there are too many groups — millions of groups → spill to disk. Pre-filter in WHERE or use Materialized Views
The Huge Number of Groups Problem
If after WHERE there are 100M rows and 10M groups:
-- Problem: database must allocate memory for ALL 10M groups
SELECT user_id, SUM(amount)
FROM orders
WHERE created_at > '2023-01-01' -- 100M rows
GROUP BY user_id -- 10M groups
HAVING SUM(amount) > 10000; -- Filter after
What happens:
- HashAggregate tries to place 10M groups in
work_mem(work_mem — memory for one operation, default 4MB) - If it doesn’t fit → spill to disk (temp_files — temporary files on disk)
- Performance drops 10-100x (disk operations are millions of times slower than RAM)
Solutions:
-- 1. Pre-filtering via CTE
WITH filtered_orders AS (
SELECT user_id, amount
FROM orders
WHERE created_at > '2023-01-01'
AND amount > 100 -- Additional filter
)
SELECT user_id, SUM(amount)
FROM filtered_orders
GROUP BY user_id
HAVING SUM(amount) > 10000;
-- 2. Materialized view for frequent queries
CREATE MATERIALIZED VIEW mv_user_stats AS
SELECT user_id, SUM(amount) as total, COUNT(*) as cnt
FROM orders
GROUP BY user_id;
-- Then a fast query
SELECT user_id FROM mv_user_stats WHERE total > 10000;
FILTER in HAVING for Declarativeness
-- Elegant checking of complex conditions
SELECT project_id
FROM tasks
GROUP BY project_id
HAVING
COUNT(*) > 10 -- Minimum 10 tasks
AND COUNT(*) FILTER (WHERE status = 'done') >= 8 -- 80% completed
AND COUNT(*) FILTER (WHERE status = 'blocked') = 0 -- No blockers
AND MAX(due_date) > CURRENT_DATE; -- Deadline hasn't passed
Redundancy and the Optimizer
PostgreSQL tries to do Filter Push-down — move conditions from HAVING to WHERE:
-- Query
SELECT city, COUNT(*) FROM users GROUP BY city HAVING city = 'Moscow';
-- Optimizer transforms into:
SELECT city, COUNT(*) FROM users WHERE city = 'Moscow' GROUP BY city;
BUT: This doesn’t always work!
- Complex expressions with subqueries
- Inside Views
- CTEs with
MATERIALIZED
Golden rule: Always write non-aggregate filters in WHERE explicitly.
SQL Standards and Portability
-- ⚠️ Works in PostgreSQL, does NOT work in SQL Server/Oracle
SELECT city, COUNT(*) as cnt FROM users
GROUP BY city
HAVING cnt > 10; -- cnt is an alias from SELECT
-- ✅ Portable across all DBMS
SELECT city, COUNT(*) FROM users
GROUP BY city
HAVING COUNT(*) > 10;
Edge Cases
- HAVING with a subquery
-- Users whose order total is above average SELECT user_id, SUM(amount) as total FROM orders GROUP BY user_id HAVING SUM(amount) > (SELECT AVG(total) FROM ( SELECT SUM(amount) as total FROM orders GROUP BY user_id ) sub); - HAVING with window functions — NOT ALLOWED
-- ❌ ERROR: window functions are computed AFTER HAVING SELECT user_id, amount, ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY created_at DESC) as rn FROM orders GROUP BY user_id, amount, created_at HAVING rn = 1; -- Error! -- ✅ Solution: CTE WITH ranked AS ( SELECT user_id, amount, ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY created_at DESC) as rn FROM orders ) SELECT user_id, amount FROM ranked WHERE rn = 1; - HAVING with multiple GROUP BY
-- ROLLUP + HAVING SELECT department, city, COUNT(*) FROM employees GROUP BY ROLLUP(department, city) HAVING COUNT(*) > 5; -- Filters ALL levels (detailed and total rows)
Performance Impact
-- Monitoring work_mem pressure
EXPLAIN (ANALYZE, BUFFERS)
SELECT status, COUNT(*) FROM orders GROUP BY status HAVING COUNT(*) > 100;
-- Look for:
-- HashAggregate (memory usage: X kB) → if X is close to work_mem → spill risk
-- HashAggregate (disk: Y kB) → spill happened! Increase work_mem
Production Experience
Real scenario:
- E-commerce: sales report (500M rows)
- ORM generated
HAVING status = 'COMPLETED'instead ofWHERE - HashAggregate: 500M rows → spill 30 GB to disk
- Time: 15 minutes
- Solution:
WHERE status = 'COMPLETED'before GROUP BY - Result: 10 seconds (90x speedup)
- Why: ORM generated
HAVING status = 'COMPLETED'— the database first grouped all 500M rows (including cancelled, returned, etc.), creating millions of groups, and only then filtered. Moving to WHERE reduced input data for GROUP BY 10x + allowed using an index on status.
Monitoring
-- 1. Find queries with HAVING in pg_stat_statements
SELECT query, mean_exec_time, calls, shared_blks_read
FROM pg_stat_statements
WHERE query LIKE '%HAVING%'
ORDER BY mean_exec_time DESC
LIMIT 10;
-- 2. Check for spill
EXPLAIN (ANALYZE, BUFFERS)
SELECT ... GROUP BY ... HAVING ...;
-- "Disk: X kB" → problem!
-- 3. Check Filter Push-down
EXPLAIN (VERBOSE)
SELECT city, COUNT(*) FROM users GROUP BY city HAVING city = 'Moscow';
-- In the plan you'll see if the filter was pushed
Best Practices
- WHERE for non-aggregate filters, HAVING for aggregate filters
- FILTER (WHERE …) inside HAVING for complex conditions
- Avoid aliases from SELECT in HAVING (not portable)
- Monitor work_mem for large GROUP BY
- CTE/Materialized Views for pre-filtering
- Check the plan for spill (Disk usage)
- Remember: HAVING doesn’t save from memory overflow during GROUP BY
Summary for Senior
- HAVING is the last line of filtering after GROUP BY
- Doesn’t save resources during aggregation — all groups are already created
- Filter Push-down (PG 12+) doesn’t always work — write WHERE explicitly
- work_mem pressure: HAVING filters OUTPUT, not INPUT
- Window functions are not available in HAVING → CTE/subquery
- FILTER for declarativeness of complex conditions
- Portability: don’t use aliases from SELECT in HAVING
🎯 Interview Cheat Sheet
Must know:
- HAVING filters the RESULT of GROUP BY, not input data
- WHERE reduces data BEFORE GROUP BY → less work_mem, faster
- Filter Push-down (PG 12+): automatic move from HAVING to WHERE, but NOT always
- Huge number of groups: 10M groups → spill to disk → CTE/Materialized View
- FILTER in HAVING:
COUNT(*) FILTER (WHERE status = 'blocked') > 2 - Window functions are computed AFTER HAVING → subquery/CTE
- Aliases from SELECT in HAVING: NOT portable (SQL Server/Oracle don’t support)
- HAVING with subquery: possible, but slow
Frequent follow-up questions:
- “Why does ORM generate HAVING instead of WHERE?” → ORM anti-pattern, doesn’t optimize
- “How to avoid spill with 10M groups?” → Filter in WHERE, work_mem ↑, Materialized View
- “Can FILTER be used in HAVING?” → Yes, elegant for complex conditions
- “When is HAVING with subquery justified?” → Comparing group to overall average
Red flags (DO NOT say):
- ❌ “HAVING reduces data for GROUP BY” (no, it filters OUTPUT)
- ❌ “Can rely on Filter Push-down” (doesn’t always work!)
- ❌ “Window functions can go in HAVING” (computed AFTER!)
Related topics:
- [[What is the difference between WHERE and HAVING]] → execution order
- [[What does GROUP BY do]] → huge number of groups
- [[What are Window Functions]] → why they can’t go in HAVING