What Does GROUP BY Do?

🟢 Junior Level

GROUP BY — combines rows with the same values into groups and allows applying aggregate functions to them.

Simple analogy: You have a basket of different types of fruits. GROUP BY is when you sort them into piles: apples separately, bananas separately, oranges separately. Then you can count how many fruits are in each pile.

Example:

-- orders table:
-- user_id | amount
-- 1       | 100
-- 1       | 200
-- 2       | 150
-- 2       | 50

-- Group by user_id and compute sum
SELECT user_id, SUM(amount) as total
FROM orders
GROUP BY user_id;

-- Result:
-- user_id | total
-- 1       | 300
-- 2       | 200

Main aggregate functions:

COUNT(*) — number of rows
SUM(col) — sum
AVG(col) — average
MIN(col) / MAX(col) — minimum/maximum

🟡 Middle Level

How It Works Internally

PostgreSQL uses two strategies for GROUP BY:

work_mem — a PostgreSQL parameter limiting memory for one operation. Default is 4MB. Spill to disk — when an operation doesn’t fit in work_mem, PostgreSQL writes intermediate data to disk in temporary files. This slows execution 10-100x.

1. HashAggregate

1. Create a hash table in memory (work_mem)
2. For each row:
   - Compute hash of the grouping key
   - If key exists in hash → update aggregate
   - If not → create a new entry

When: Data is not sorted
Requires: work_mem to store all groups

2. GroupAggregate

1. Sort data by the grouping key
2. Iterate through sorted data and group consecutive identical keys

When: Data is already sorted (by index)
Advantage: Requires minimal memory

Strategy Comparison

Strategy	When	Memory	Sorting
HashAggregate	Data is not sorted	`work_mem`	Not needed
GroupAggregate	Data is sorted	Minimal	Needed (or index)

NULL Handling

-- All NULLs group TOGETHER
SELECT status, COUNT(*)
FROM orders
GROUP BY status;

-- Result:
-- status   | count
-- ACTIVE   | 100
-- PENDING  | 50
-- NULL     | 10   ← All NULLs in one group!

Common Mistakes

Forgot column in GROUP BY

-- ❌ ERROR: name not in GROUP BY
SELECT user_id, name, SUM(amount)
FROM orders
GROUP BY user_id;

-- ✅ Either add name to GROUP BY
SELECT user_id, name, SUM(amount)
FROM orders
GROUP BY user_id, name;

-- ✅ Or use an aggregate
SELECT user_id, MAX(name), SUM(amount)
FROM orders
GROUP BY user_id;

Exception: grouping by PRIMARY KEY

-- ✅ Allowed! PK functionally determines all columns
SELECT u.id, u.name, u.email, COUNT(o.id)
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id;  -- name and email can be omitted

Filtering after GROUP BY

-- ❌ WHERE cannot be used with aggregates
SELECT user_id, COUNT(*) FROM orders WHERE COUNT(*) > 5 GROUP BY user_id;

-- ✅ Use HAVING
SELECT user_id, COUNT(*) FROM orders GROUP BY user_id HAVING COUNT(*) > 5;

Practical Examples

-- Finding duplicates
SELECT email, COUNT(*) as cnt
FROM users
GROUP BY email
HAVING COUNT(*) > 1;

-- Aggregation with conditions (FILTER)
SELECT
    department,
    COUNT(*) FILTER (WHERE status = 'ACTIVE') as active_count,
    COUNT(*) FILTER (WHERE status = 'INACTIVE') as inactive_count
FROM employees
GROUP BY department;

🔴 Senior Level

When NOT to Use GROUP BY

When you need details of each row — use window functions instead of GROUP BY (they don’t collapse rows)
When there are too many groups (millions) — use pre-aggregation via CTEs or Materialized Views
When you only need to check existence — use EXISTS (Semi-Join), not GROUP BY + HAVING COUNT(*) > 0

Physical Strategies: Deep Dive

HashAggregate in detail:

Phase 1: Build
  For each row:
    hash_key = hash(group_by_columns)
    if hash_key in hash_table:
      update_aggregate(hash_table[hash_key], row)
    else:
      create_new_group(hash_key, row)

Phase 2: Output
  For each group in hash_table:
    emit(group)

Spill to Disk (PG 13+):

⚠️ IMPORTANT for migration: Before PG 13, a query with HashAggregate exceeding work_mem would error with out of memory. On PG 13+ the same query will execute, but silently go to disk. When migrating from PG 12 to PG 13, queries that used to crash will start working — but extremely slowly. Monitor Disk: X kB in EXPLAIN!

Before PG 13: HashAggregate doesn't fit in work_mem → error
PG 13+:       HashAggregate spills parts of hash to disk
              → Processes batch by batch
              → Slower than in memory, but works!

GroupAggregate + Index:

-- If there's an index on the group_by column
CREATE INDEX idx_orders_user_id ON orders(user_id);

-- Query uses Index Scan → data is already sorted
SELECT user_id, SUM(amount) FROM orders GROUP BY user_id;
-- Plan: GroupAggregate → Index Scan (no Sort!)
-- → Minimal memory, maximum speed

Parallel Aggregation

-- PostgreSQL splits work across workers
EXPLAIN SELECT department, COUNT(*) FROM employees GROUP BY department;

-- Plan:
-- Finalize GroupAggregate        ← Main process combines results from workers
--   Gather                        ← Data collection node from parallel workers
--     Partial GroupAggregate      ← Worker 1: groups ITS portion of data
--       Parallel Seq Scan         ← Worker 1 reads table in parallel
--     Partial GroupAggregate      ← Worker 2: groups ITS portion of data
--       Parallel Seq Scan         ← Worker 2 reads table in parallel

-- Each worker reads its part of the table, groups it (Partial),
-- then the main process combines intermediate results (Finalize).

How it works:

Each worker groups its portion of data (Partial Aggregate)
Main process combines intermediate results (Combine)
Finalizes aggregates (Finalize)

Limitations:

Not all aggregate functions support parallelism
COUNT, SUM, AVG — support
Custom aggregates — need to check proparallel in pg_proc

ROLLUP, CUBE, GROUPING SETS

ROLLUP — hierarchical totals:

SELECT department, city, COUNT(*)
FROM employees
GROUP BY ROLLUP(department, city);

-- Result:
-- dept    | city     | count
-- IT      | Moscow   | 10
-- IT      | SPb      | 5
-- IT      | NULL     | 15   ← Total for IT
-- NULL    | NULL     | 50   ← Grand total

CUBE — all combinations:

SELECT department, city, COUNT(*)
FROM employees
GROUP BY CUBE(department, city);

-- Result includes:
-- (dept, city), (dept, NULL), (NULL, city), (NULL, NULL)
-- 4 levels of detail in ONE pass!

GROUPING SETS — explicit list:

SELECT department, city, COUNT(*)
FROM employees
GROUP BY GROUPING SETS (
    (department, city),  -- Detail
    (department),        -- Total by department
    ()                   -- Grand total
);

Why this is better than UNION ALL:

-- ❌ 3 passes over the table
SELECT department, city, COUNT(*) FROM employees GROUP BY department, city
UNION ALL
SELECT department, NULL, COUNT(*) FROM employees GROUP BY department
UNION ALL
SELECT NULL, NULL, COUNT(*) FROM employees;

-- ✅ 1 pass (Hash or Sort)
SELECT department, city, COUNT(*) FROM employees
GROUP BY ROLLUP(department, city);

FILTER (Clean SQL)

-- Elegant aggregation without CASE WHEN
SELECT
    department,
    COUNT(*) as total,
    COUNT(*) FILTER (WHERE salary > 100000) as high_paid,
    COUNT(*) FILTER (WHERE hire_date > '2023-01-01') as new_hires,
    AVG(salary) FILTER (WHERE role = 'Senior') as senior_avg_salary
FROM employees
GROUP BY department;

Incremental Sort (PG 13+)

-- Index on (department, hire_date)
CREATE INDEX idx_emp_dept_date ON employees(department, hire_date);

-- Query with GROUP BY and ORDER BY
SELECT department, hire_date, COUNT(*)
FROM employees
GROUP BY department, hire_date
ORDER BY department, hire_date;

-- PG 13+: Incremental Sort
-- Data is already sorted by department (from index)
-- → Sort only by hire_date within each department
-- → 5-10x faster than full sort!

On PG 12 and below — full sort of all ORDER BY columns, even if part is already sorted by index.

Edge Cases

Empty GROUP BY result

-- If table is empty → 0 rows (not NULL!)
SELECT status, COUNT(*) FROM empty_table GROUP BY status;
-- Result: 0 rows

GROUP BY with expressions

-- Grouping by expression
SELECT
    EXTRACT(YEAR FROM created_at) as year,
    EXTRACT(MONTH FROM created_at) as month,
    COUNT(*)
FROM orders
GROUP BY year, month
ORDER BY year, month;

GROUPING() function

-- Determine if a row is a total row
SELECT
    department,
    city,
    COUNT(*),
    GROUPING(department) as is_dept_total,  -- 1 if NULL from ROLLUP
    GROUPING(city) as is_city_total
FROM employees
GROUP BY ROLLUP(department, city);

Performance Tuning

work_mem for HashAggregate:

-- Check current value
SHOW work_mem;  -- 4MB by default

-- If you see spill to disk → increase it
SET work_mem = '256MB';
SELECT department, COUNT(*) FROM large_table GROUP BY department;

Index for GroupAggregate:

-- Create index on GROUP BY columns
CREATE INDEX idx_orders_status_date ON orders(status, created_at);

-- Query without Sort
SELECT status, created_at, COUNT(*)
FROM orders
GROUP BY status, created_at;
-- Plan: GroupAggregate → Index Only Scan

Production Experience

Real scenario #1: HashAggregate Spill

Analytics: 200M rows, GROUP BY 5 columns
work_mem = 4MB → spill 15 GB to disk
Time: 8 minutes
Solution: SET work_mem = '512MB' → all in memory
Result: 45 seconds (10x speedup)

Real scenario #2: CUBE instead of UNION ALL

Report: 8 UNION ALL for different slices
Time: 2 minutes (8 passes over the table)
Solution: GROUP BY CUBE(region, product, month)
Result: 15 seconds (1 pass)

Monitoring

-- 1. Check aggregation strategy
EXPLAIN (ANALYZE, BUFFERS)
SELECT department, COUNT(*) FROM employees GROUP BY department;
-- HashAggregate vs GroupAggregate

-- 2. Check spill
EXPLAIN (ANALYZE)
SELECT ... GROUP BY ...;
-- Look for: "Disk: X kB" → bad!

-- 3. Check parallelism
EXPLAIN SELECT ... GROUP BY ...;
-- Workers Planned: 4 → great
-- Workers Planned: 0 → check max_parallel_workers_per_gather

-- 4. Find slow GROUP BY queries
SELECT query, mean_exec_time
FROM pg_stat_statements
WHERE query LIKE '%GROUP BY%'
ORDER BY mean_exec_time DESC
LIMIT 10;

Best Practices

Index on GROUP BY columns → GroupAggregate without Sort
Control work_mem for HashAggregate
FILTER instead of CASE WHEN in aggregates
ROLLUP/CUBE/GROUPING SETS instead of UNION ALL
PK in GROUP BY allows omitting other columns of the table
Check parallelism via EXPLAIN
Incremental Sort (PG 13+) use with indexes

Summary for Senior

2 strategies: HashAggregate (requires work_mem) vs GroupAggregate (requires Sort/index)
PG 13+: HashAggregate supports spill → doesn’t crash, but slower
Parallel aggregation: Partial → Gather → Finalize
ROLLUP/CUBE/GROUPING SETS: one pass instead of UNION ALL
FILTER: clean SQL for conditional aggregation
Incremental Sort (PG 13+): savings on partial sorting
Index on GROUP BY → GroupAggregate without Sort → minimal RAM
Always check the plan: Hash vs Group, Workers, Spill

🎯 Interview Cheat Sheet

Must know:

2 strategies: HashAggregate (hash table in work_mem) vs GroupAggregate (sorted data)
HashAggregate: O(N), requires work_mem, PG 13+ supports spill to disk
GroupAggregate: O(N log N) for sorting, but minimal RAM; with index → no Sort
Parallel aggregation: Partial (workers) → Gather → Finalize (main process)
ROLLUP: hierarchical totals, CUBE: all combinations, GROUPING SETS: explicit list
ROLLUP/CUBE/GROUPING SETS = 1 pass vs UNION ALL = N passes
FILTER: COUNT(*) FILTER (WHERE status = 'ACTIVE') — better than CASE WHEN
Incremental Sort (PG 13+): data is partially sorted → sort the rest
Index on GROUP BY columns → GroupAggregate without Sort
NULLs group together

Frequent follow-up questions:

“When is HashAggregate better than GroupAggregate?” → Data is not sorted + fits in work_mem
“What happens if HashAggregate doesn’t fit in work_mem?” → PG 13+ spills to disk, before PG 13 — error
“Why is ROLLUP better than UNION ALL?” → 1 pass instead of N
“How to speed up GROUP BY on 200M rows?” → work_mem ↑, index on GROUP BY, parallelism

Red flags (DO NOT say):

❌ “GROUP BY always sorts” (HashAggregate doesn’t sort)
❌ “spill to disk is normal” (10-100x slower!)
❌ “UNION ALL for different slices is normal” (CUBE/GROUPING SETS is faster)

Related topics:

[[What is the difference between WHERE and HAVING]] → filtering before/after GROUP BY
[[When to use HAVING]] → filtering by aggregates
[[What are Window Functions]] → alternative for aggregation without grouping