What Does GROUP BY Do?
PostgreSQL uses two strategies for GROUP BY:
π’ Junior Level
GROUP BY β combines rows with the same values into groups and allows applying aggregate functions to them.
Simple analogy: You have a basket of different types of fruits. GROUP BY is when you sort them into piles: apples separately, bananas separately, oranges separately. Then you can count how many fruits are in each pile.
Example:
-- orders table:
-- user_id | amount
-- 1 | 100
-- 1 | 200
-- 2 | 150
-- 2 | 50
-- Group by user_id and compute sum
SELECT user_id, SUM(amount) as total
FROM orders
GROUP BY user_id;
-- Result:
-- user_id | total
-- 1 | 300
-- 2 | 200
Main aggregate functions:
COUNT(*)β number of rowsSUM(col)β sumAVG(col)β averageMIN(col)/MAX(col)β minimum/maximum
π‘ Middle Level
How It Works Internally
PostgreSQL uses two strategies for GROUP BY:
work_mem β a PostgreSQL parameter limiting memory for one operation. Default is 4MB. Spill to disk β when an operation doesnβt fit in work_mem, PostgreSQL writes intermediate data to disk in temporary files. This slows execution 10-100x.
1. HashAggregate
1. Create a hash table in memory (work_mem)
2. For each row:
- Compute hash of the grouping key
- If key exists in hash β update aggregate
- If not β create a new entry
- When: Data is not sorted
- Requires:
work_memto store all groups
2. GroupAggregate
1. Sort data by the grouping key
2. Iterate through sorted data and group consecutive identical keys
- When: Data is already sorted (by index)
- Advantage: Requires minimal memory
Strategy Comparison
| Strategy | When | Memory | Sorting |
|---|---|---|---|
| HashAggregate | Data is not sorted | work_mem |
Not needed |
| GroupAggregate | Data is sorted | Minimal | Needed (or index) |
NULL Handling
-- All NULLs group TOGETHER
SELECT status, COUNT(*)
FROM orders
GROUP BY status;
-- Result:
-- status | count
-- ACTIVE | 100
-- PENDING | 50
-- NULL | 10 β All NULLs in one group!
Common Mistakes
- Forgot column in GROUP BY
-- β ERROR: name not in GROUP BY SELECT user_id, name, SUM(amount) FROM orders GROUP BY user_id; -- β Either add name to GROUP BY SELECT user_id, name, SUM(amount) FROM orders GROUP BY user_id, name; -- β Or use an aggregate SELECT user_id, MAX(name), SUM(amount) FROM orders GROUP BY user_id; - Exception: grouping by PRIMARY KEY
-- β Allowed! PK functionally determines all columns SELECT u.id, u.name, u.email, COUNT(o.id) FROM users u LEFT JOIN orders o ON u.id = o.user_id GROUP BY u.id; -- name and email can be omitted - Filtering after GROUP BY
-- β WHERE cannot be used with aggregates SELECT user_id, COUNT(*) FROM orders WHERE COUNT(*) > 5 GROUP BY user_id; -- β Use HAVING SELECT user_id, COUNT(*) FROM orders GROUP BY user_id HAVING COUNT(*) > 5;
Practical Examples
-- Finding duplicates
SELECT email, COUNT(*) as cnt
FROM users
GROUP BY email
HAVING COUNT(*) > 1;
-- Aggregation with conditions (FILTER)
SELECT
department,
COUNT(*) FILTER (WHERE status = 'ACTIVE') as active_count,
COUNT(*) FILTER (WHERE status = 'INACTIVE') as inactive_count
FROM employees
GROUP BY department;
π΄ Senior Level
When NOT to Use GROUP BY
- When you need details of each row β use window functions instead of GROUP BY (they donβt collapse rows)
- When there are too many groups (millions) β use pre-aggregation via CTEs or Materialized Views
- When you only need to check existence β use EXISTS (Semi-Join), not GROUP BY + HAVING COUNT(*) > 0
Physical Strategies: Deep Dive
HashAggregate in detail:
Phase 1: Build
For each row:
hash_key = hash(group_by_columns)
if hash_key in hash_table:
update_aggregate(hash_table[hash_key], row)
else:
create_new_group(hash_key, row)
Phase 2: Output
For each group in hash_table:
emit(group)
Spill to Disk (PG 13+):
β οΈ IMPORTANT for migration: Before PG 13, a query with HashAggregate exceeding work_mem would error with
out of memory. On PG 13+ the same query will execute, but silently go to disk. When migrating from PG 12 to PG 13, queries that used to crash will start working β but extremely slowly. MonitorDisk: X kBin EXPLAIN!
Before PG 13: HashAggregate doesn't fit in work_mem β error
PG 13+: HashAggregate spills parts of hash to disk
β Processes batch by batch
β Slower than in memory, but works!
GroupAggregate + Index:
-- If there's an index on the group_by column
CREATE INDEX idx_orders_user_id ON orders(user_id);
-- Query uses Index Scan β data is already sorted
SELECT user_id, SUM(amount) FROM orders GROUP BY user_id;
-- Plan: GroupAggregate β Index Scan (no Sort!)
-- β Minimal memory, maximum speed
Parallel Aggregation
-- PostgreSQL splits work across workers
EXPLAIN SELECT department, COUNT(*) FROM employees GROUP BY department;
-- Plan:
-- Finalize GroupAggregate β Main process combines results from workers
-- Gather β Data collection node from parallel workers
-- Partial GroupAggregate β Worker 1: groups ITS portion of data
-- Parallel Seq Scan β Worker 1 reads table in parallel
-- Partial GroupAggregate β Worker 2: groups ITS portion of data
-- Parallel Seq Scan β Worker 2 reads table in parallel
-- Each worker reads its part of the table, groups it (Partial),
-- then the main process combines intermediate results (Finalize).
How it works:
- Each worker groups its portion of data (Partial Aggregate)
- Main process combines intermediate results (Combine)
- Finalizes aggregates (Finalize)
Limitations:
- Not all aggregate functions support parallelism
COUNT,SUM,AVGβ support- Custom aggregates β need to check
proparallelinpg_proc
ROLLUP, CUBE, GROUPING SETS
ROLLUP β hierarchical totals:
SELECT department, city, COUNT(*)
FROM employees
GROUP BY ROLLUP(department, city);
-- Result:
-- dept | city | count
-- IT | Moscow | 10
-- IT | SPb | 5
-- IT | NULL | 15 β Total for IT
-- NULL | NULL | 50 β Grand total
CUBE β all combinations:
SELECT department, city, COUNT(*)
FROM employees
GROUP BY CUBE(department, city);
-- Result includes:
-- (dept, city), (dept, NULL), (NULL, city), (NULL, NULL)
-- 4 levels of detail in ONE pass!
GROUPING SETS β explicit list:
SELECT department, city, COUNT(*)
FROM employees
GROUP BY GROUPING SETS (
(department, city), -- Detail
(department), -- Total by department
() -- Grand total
);
Why this is better than UNION ALL:
-- β 3 passes over the table
SELECT department, city, COUNT(*) FROM employees GROUP BY department, city
UNION ALL
SELECT department, NULL, COUNT(*) FROM employees GROUP BY department
UNION ALL
SELECT NULL, NULL, COUNT(*) FROM employees;
-- β
1 pass (Hash or Sort)
SELECT department, city, COUNT(*) FROM employees
GROUP BY ROLLUP(department, city);
FILTER (Clean SQL)
-- Elegant aggregation without CASE WHEN
SELECT
department,
COUNT(*) as total,
COUNT(*) FILTER (WHERE salary > 100000) as high_paid,
COUNT(*) FILTER (WHERE hire_date > '2023-01-01') as new_hires,
AVG(salary) FILTER (WHERE role = 'Senior') as senior_avg_salary
FROM employees
GROUP BY department;
Incremental Sort (PG 13+)
-- Index on (department, hire_date)
CREATE INDEX idx_emp_dept_date ON employees(department, hire_date);
-- Query with GROUP BY and ORDER BY
SELECT department, hire_date, COUNT(*)
FROM employees
GROUP BY department, hire_date
ORDER BY department, hire_date;
-- PG 13+: Incremental Sort
-- Data is already sorted by department (from index)
-- β Sort only by hire_date within each department
-- β 5-10x faster than full sort!
On PG 12 and below β full sort of all ORDER BY columns, even if part is already sorted by index.
Edge Cases
- Empty GROUP BY result
-- If table is empty β 0 rows (not NULL!) SELECT status, COUNT(*) FROM empty_table GROUP BY status; -- Result: 0 rows - GROUP BY with expressions
-- Grouping by expression SELECT EXTRACT(YEAR FROM created_at) as year, EXTRACT(MONTH FROM created_at) as month, COUNT(*) FROM orders GROUP BY year, month ORDER BY year, month; - GROUPING() function
-- Determine if a row is a total row SELECT department, city, COUNT(*), GROUPING(department) as is_dept_total, -- 1 if NULL from ROLLUP GROUPING(city) as is_city_total FROM employees GROUP BY ROLLUP(department, city);
Performance Tuning
work_mem for HashAggregate:
-- Check current value
SHOW work_mem; -- 4MB by default
-- If you see spill to disk β increase it
SET work_mem = '256MB';
SELECT department, COUNT(*) FROM large_table GROUP BY department;
Index for GroupAggregate:
-- Create index on GROUP BY columns
CREATE INDEX idx_orders_status_date ON orders(status, created_at);
-- Query without Sort
SELECT status, created_at, COUNT(*)
FROM orders
GROUP BY status, created_at;
-- Plan: GroupAggregate β Index Only Scan
Production Experience
Real scenario #1: HashAggregate Spill
- Analytics: 200M rows, GROUP BY 5 columns
- work_mem = 4MB β spill 15 GB to disk
- Time: 8 minutes
- Solution:
SET work_mem = '512MB'β all in memory - Result: 45 seconds (10x speedup)
Real scenario #2: CUBE instead of UNION ALL
- Report: 8 UNION ALL for different slices
- Time: 2 minutes (8 passes over the table)
- Solution: GROUP BY CUBE(region, product, month)
- Result: 15 seconds (1 pass)
Monitoring
-- 1. Check aggregation strategy
EXPLAIN (ANALYZE, BUFFERS)
SELECT department, COUNT(*) FROM employees GROUP BY department;
-- HashAggregate vs GroupAggregate
-- 2. Check spill
EXPLAIN (ANALYZE)
SELECT ... GROUP BY ...;
-- Look for: "Disk: X kB" β bad!
-- 3. Check parallelism
EXPLAIN SELECT ... GROUP BY ...;
-- Workers Planned: 4 β great
-- Workers Planned: 0 β check max_parallel_workers_per_gather
-- 4. Find slow GROUP BY queries
SELECT query, mean_exec_time
FROM pg_stat_statements
WHERE query LIKE '%GROUP BY%'
ORDER BY mean_exec_time DESC
LIMIT 10;
Best Practices
- Index on GROUP BY columns β GroupAggregate without Sort
- Control work_mem for HashAggregate
- FILTER instead of
CASE WHENin aggregates - ROLLUP/CUBE/GROUPING SETS instead of UNION ALL
- PK in GROUP BY allows omitting other columns of the table
- Check parallelism via
EXPLAIN - Incremental Sort (PG 13+) use with indexes
Summary for Senior
- 2 strategies: HashAggregate (requires work_mem) vs GroupAggregate (requires Sort/index)
- PG 13+: HashAggregate supports spill β doesnβt crash, but slower
- Parallel aggregation: Partial β Gather β Finalize
- ROLLUP/CUBE/GROUPING SETS: one pass instead of UNION ALL
- FILTER: clean SQL for conditional aggregation
- Incremental Sort (PG 13+): savings on partial sorting
- Index on GROUP BY β GroupAggregate without Sort β minimal RAM
- Always check the plan: Hash vs Group, Workers, Spill
π― Interview Cheat Sheet
Must know:
- 2 strategies: HashAggregate (hash table in work_mem) vs GroupAggregate (sorted data)
- HashAggregate: O(N), requires work_mem, PG 13+ supports spill to disk
- GroupAggregate: O(N log N) for sorting, but minimal RAM; with index β no Sort
- Parallel aggregation: Partial (workers) β Gather β Finalize (main process)
- ROLLUP: hierarchical totals, CUBE: all combinations, GROUPING SETS: explicit list
- ROLLUP/CUBE/GROUPING SETS = 1 pass vs UNION ALL = N passes
- FILTER:
COUNT(*) FILTER (WHERE status = 'ACTIVE')β better than CASE WHEN - Incremental Sort (PG 13+): data is partially sorted β sort the rest
- Index on GROUP BY columns β GroupAggregate without Sort
- NULLs group together
Frequent follow-up questions:
- βWhen is HashAggregate better than GroupAggregate?β β Data is not sorted + fits in work_mem
- βWhat happens if HashAggregate doesnβt fit in work_mem?β β PG 13+ spills to disk, before PG 13 β error
- βWhy is ROLLUP better than UNION ALL?β β 1 pass instead of N
- βHow to speed up GROUP BY on 200M rows?β β work_mem β, index on GROUP BY, parallelism
Red flags (DO NOT say):
- β βGROUP BY always sortsβ (HashAggregate doesnβt sort)
- β βspill to disk is normalβ (10-100x slower!)
- β βUNION ALL for different slices is normalβ (CUBE/GROUPING SETS is faster)
Related topics:
- [[What is the difference between WHERE and HAVING]] β filtering before/after GROUP BY
- [[When to use HAVING]] β filtering by aggregates
- [[What are Window Functions]] β alternative for aggregation without grouping