Question 12 Β· Section 1

What Does GROUP BY Do?

PostgreSQL uses two strategies for GROUP BY:

Language versions: English Russian Ukrainian

🟒 Junior Level

GROUP BY β€” combines rows with the same values into groups and allows applying aggregate functions to them.

Simple analogy: You have a basket of different types of fruits. GROUP BY is when you sort them into piles: apples separately, bananas separately, oranges separately. Then you can count how many fruits are in each pile.

Example:

-- orders table:
-- user_id | amount
-- 1       | 100
-- 1       | 200
-- 2       | 150
-- 2       | 50

-- Group by user_id and compute sum
SELECT user_id, SUM(amount) as total
FROM orders
GROUP BY user_id;

-- Result:
-- user_id | total
-- 1       | 300
-- 2       | 200

Main aggregate functions:

  • COUNT(*) β€” number of rows
  • SUM(col) β€” sum
  • AVG(col) β€” average
  • MIN(col) / MAX(col) β€” minimum/maximum

🟑 Middle Level

How It Works Internally

PostgreSQL uses two strategies for GROUP BY:

work_mem β€” a PostgreSQL parameter limiting memory for one operation. Default is 4MB. Spill to disk β€” when an operation doesn’t fit in work_mem, PostgreSQL writes intermediate data to disk in temporary files. This slows execution 10-100x.

1. HashAggregate

1. Create a hash table in memory (work_mem)
2. For each row:
   - Compute hash of the grouping key
   - If key exists in hash β†’ update aggregate
   - If not β†’ create a new entry
  • When: Data is not sorted
  • Requires: work_mem to store all groups

2. GroupAggregate

1. Sort data by the grouping key
2. Iterate through sorted data and group consecutive identical keys
  • When: Data is already sorted (by index)
  • Advantage: Requires minimal memory

Strategy Comparison

Strategy When Memory Sorting
HashAggregate Data is not sorted work_mem Not needed
GroupAggregate Data is sorted Minimal Needed (or index)

NULL Handling

-- All NULLs group TOGETHER
SELECT status, COUNT(*)
FROM orders
GROUP BY status;

-- Result:
-- status   | count
-- ACTIVE   | 100
-- PENDING  | 50
-- NULL     | 10   ← All NULLs in one group!

Common Mistakes

  1. Forgot column in GROUP BY
    -- ❌ ERROR: name not in GROUP BY
    SELECT user_id, name, SUM(amount)
    FROM orders
    GROUP BY user_id;
    
    -- βœ… Either add name to GROUP BY
    SELECT user_id, name, SUM(amount)
    FROM orders
    GROUP BY user_id, name;
    
    -- βœ… Or use an aggregate
    SELECT user_id, MAX(name), SUM(amount)
    FROM orders
    GROUP BY user_id;
    
  2. Exception: grouping by PRIMARY KEY
    -- βœ… Allowed! PK functionally determines all columns
    SELECT u.id, u.name, u.email, COUNT(o.id)
    FROM users u
    LEFT JOIN orders o ON u.id = o.user_id
    GROUP BY u.id;  -- name and email can be omitted
    
  3. Filtering after GROUP BY
    -- ❌ WHERE cannot be used with aggregates
    SELECT user_id, COUNT(*) FROM orders WHERE COUNT(*) > 5 GROUP BY user_id;
    
    -- βœ… Use HAVING
    SELECT user_id, COUNT(*) FROM orders GROUP BY user_id HAVING COUNT(*) > 5;
    

Practical Examples

-- Finding duplicates
SELECT email, COUNT(*) as cnt
FROM users
GROUP BY email
HAVING COUNT(*) > 1;

-- Aggregation with conditions (FILTER)
SELECT
    department,
    COUNT(*) FILTER (WHERE status = 'ACTIVE') as active_count,
    COUNT(*) FILTER (WHERE status = 'INACTIVE') as inactive_count
FROM employees
GROUP BY department;

πŸ”΄ Senior Level

When NOT to Use GROUP BY

  1. When you need details of each row β€” use window functions instead of GROUP BY (they don’t collapse rows)
  2. When there are too many groups (millions) β€” use pre-aggregation via CTEs or Materialized Views
  3. When you only need to check existence β€” use EXISTS (Semi-Join), not GROUP BY + HAVING COUNT(*) > 0

Physical Strategies: Deep Dive

HashAggregate in detail:

Phase 1: Build
  For each row:
    hash_key = hash(group_by_columns)
    if hash_key in hash_table:
      update_aggregate(hash_table[hash_key], row)
    else:
      create_new_group(hash_key, row)

Phase 2: Output
  For each group in hash_table:
    emit(group)

Spill to Disk (PG 13+):

⚠️ IMPORTANT for migration: Before PG 13, a query with HashAggregate exceeding work_mem would error with out of memory. On PG 13+ the same query will execute, but silently go to disk. When migrating from PG 12 to PG 13, queries that used to crash will start working β€” but extremely slowly. Monitor Disk: X kB in EXPLAIN!

Before PG 13: HashAggregate doesn't fit in work_mem β†’ error
PG 13+:       HashAggregate spills parts of hash to disk
              β†’ Processes batch by batch
              β†’ Slower than in memory, but works!

GroupAggregate + Index:

-- If there's an index on the group_by column
CREATE INDEX idx_orders_user_id ON orders(user_id);

-- Query uses Index Scan β†’ data is already sorted
SELECT user_id, SUM(amount) FROM orders GROUP BY user_id;
-- Plan: GroupAggregate β†’ Index Scan (no Sort!)
-- β†’ Minimal memory, maximum speed

Parallel Aggregation

-- PostgreSQL splits work across workers
EXPLAIN SELECT department, COUNT(*) FROM employees GROUP BY department;

-- Plan:
-- Finalize GroupAggregate        ← Main process combines results from workers
--   Gather                        ← Data collection node from parallel workers
--     Partial GroupAggregate      ← Worker 1: groups ITS portion of data
--       Parallel Seq Scan         ← Worker 1 reads table in parallel
--     Partial GroupAggregate      ← Worker 2: groups ITS portion of data
--       Parallel Seq Scan         ← Worker 2 reads table in parallel

-- Each worker reads its part of the table, groups it (Partial),
-- then the main process combines intermediate results (Finalize).

How it works:

  1. Each worker groups its portion of data (Partial Aggregate)
  2. Main process combines intermediate results (Combine)
  3. Finalizes aggregates (Finalize)

Limitations:

  • Not all aggregate functions support parallelism
  • COUNT, SUM, AVG β€” support
  • Custom aggregates β€” need to check proparallel in pg_proc

ROLLUP, CUBE, GROUPING SETS

ROLLUP β€” hierarchical totals:

SELECT department, city, COUNT(*)
FROM employees
GROUP BY ROLLUP(department, city);

-- Result:
-- dept    | city     | count
-- IT      | Moscow   | 10
-- IT      | SPb      | 5
-- IT      | NULL     | 15   ← Total for IT
-- NULL    | NULL     | 50   ← Grand total

CUBE β€” all combinations:

SELECT department, city, COUNT(*)
FROM employees
GROUP BY CUBE(department, city);

-- Result includes:
-- (dept, city), (dept, NULL), (NULL, city), (NULL, NULL)
-- 4 levels of detail in ONE pass!

GROUPING SETS β€” explicit list:

SELECT department, city, COUNT(*)
FROM employees
GROUP BY GROUPING SETS (
    (department, city),  -- Detail
    (department),        -- Total by department
    ()                   -- Grand total
);

Why this is better than UNION ALL:

-- ❌ 3 passes over the table
SELECT department, city, COUNT(*) FROM employees GROUP BY department, city
UNION ALL
SELECT department, NULL, COUNT(*) FROM employees GROUP BY department
UNION ALL
SELECT NULL, NULL, COUNT(*) FROM employees;

-- βœ… 1 pass (Hash or Sort)
SELECT department, city, COUNT(*) FROM employees
GROUP BY ROLLUP(department, city);

FILTER (Clean SQL)

-- Elegant aggregation without CASE WHEN
SELECT
    department,
    COUNT(*) as total,
    COUNT(*) FILTER (WHERE salary > 100000) as high_paid,
    COUNT(*) FILTER (WHERE hire_date > '2023-01-01') as new_hires,
    AVG(salary) FILTER (WHERE role = 'Senior') as senior_avg_salary
FROM employees
GROUP BY department;

Incremental Sort (PG 13+)

-- Index on (department, hire_date)
CREATE INDEX idx_emp_dept_date ON employees(department, hire_date);

-- Query with GROUP BY and ORDER BY
SELECT department, hire_date, COUNT(*)
FROM employees
GROUP BY department, hire_date
ORDER BY department, hire_date;

-- PG 13+: Incremental Sort
-- Data is already sorted by department (from index)
-- β†’ Sort only by hire_date within each department
-- β†’ 5-10x faster than full sort!

On PG 12 and below β€” full sort of all ORDER BY columns, even if part is already sorted by index.

Edge Cases

  1. Empty GROUP BY result
    -- If table is empty β†’ 0 rows (not NULL!)
    SELECT status, COUNT(*) FROM empty_table GROUP BY status;
    -- Result: 0 rows
    
  2. GROUP BY with expressions
    -- Grouping by expression
    SELECT
        EXTRACT(YEAR FROM created_at) as year,
        EXTRACT(MONTH FROM created_at) as month,
        COUNT(*)
    FROM orders
    GROUP BY year, month
    ORDER BY year, month;
    
  3. GROUPING() function
    -- Determine if a row is a total row
    SELECT
        department,
        city,
        COUNT(*),
        GROUPING(department) as is_dept_total,  -- 1 if NULL from ROLLUP
        GROUPING(city) as is_city_total
    FROM employees
    GROUP BY ROLLUP(department, city);
    

Performance Tuning

work_mem for HashAggregate:

-- Check current value
SHOW work_mem;  -- 4MB by default

-- If you see spill to disk β†’ increase it
SET work_mem = '256MB';
SELECT department, COUNT(*) FROM large_table GROUP BY department;

Index for GroupAggregate:

-- Create index on GROUP BY columns
CREATE INDEX idx_orders_status_date ON orders(status, created_at);

-- Query without Sort
SELECT status, created_at, COUNT(*)
FROM orders
GROUP BY status, created_at;
-- Plan: GroupAggregate β†’ Index Only Scan

Production Experience

Real scenario #1: HashAggregate Spill

  • Analytics: 200M rows, GROUP BY 5 columns
  • work_mem = 4MB β†’ spill 15 GB to disk
  • Time: 8 minutes
  • Solution: SET work_mem = '512MB' β†’ all in memory
  • Result: 45 seconds (10x speedup)

Real scenario #2: CUBE instead of UNION ALL

  • Report: 8 UNION ALL for different slices
  • Time: 2 minutes (8 passes over the table)
  • Solution: GROUP BY CUBE(region, product, month)
  • Result: 15 seconds (1 pass)

Monitoring

-- 1. Check aggregation strategy
EXPLAIN (ANALYZE, BUFFERS)
SELECT department, COUNT(*) FROM employees GROUP BY department;
-- HashAggregate vs GroupAggregate

-- 2. Check spill
EXPLAIN (ANALYZE)
SELECT ... GROUP BY ...;
-- Look for: "Disk: X kB" β†’ bad!

-- 3. Check parallelism
EXPLAIN SELECT ... GROUP BY ...;
-- Workers Planned: 4 β†’ great
-- Workers Planned: 0 β†’ check max_parallel_workers_per_gather

-- 4. Find slow GROUP BY queries
SELECT query, mean_exec_time
FROM pg_stat_statements
WHERE query LIKE '%GROUP BY%'
ORDER BY mean_exec_time DESC
LIMIT 10;

Best Practices

  1. Index on GROUP BY columns β†’ GroupAggregate without Sort
  2. Control work_mem for HashAggregate
  3. FILTER instead of CASE WHEN in aggregates
  4. ROLLUP/CUBE/GROUPING SETS instead of UNION ALL
  5. PK in GROUP BY allows omitting other columns of the table
  6. Check parallelism via EXPLAIN
  7. Incremental Sort (PG 13+) use with indexes

Summary for Senior

  • 2 strategies: HashAggregate (requires work_mem) vs GroupAggregate (requires Sort/index)
  • PG 13+: HashAggregate supports spill β†’ doesn’t crash, but slower
  • Parallel aggregation: Partial β†’ Gather β†’ Finalize
  • ROLLUP/CUBE/GROUPING SETS: one pass instead of UNION ALL
  • FILTER: clean SQL for conditional aggregation
  • Incremental Sort (PG 13+): savings on partial sorting
  • Index on GROUP BY β†’ GroupAggregate without Sort β†’ minimal RAM
  • Always check the plan: Hash vs Group, Workers, Spill

🎯 Interview Cheat Sheet

Must know:

  • 2 strategies: HashAggregate (hash table in work_mem) vs GroupAggregate (sorted data)
  • HashAggregate: O(N), requires work_mem, PG 13+ supports spill to disk
  • GroupAggregate: O(N log N) for sorting, but minimal RAM; with index β†’ no Sort
  • Parallel aggregation: Partial (workers) β†’ Gather β†’ Finalize (main process)
  • ROLLUP: hierarchical totals, CUBE: all combinations, GROUPING SETS: explicit list
  • ROLLUP/CUBE/GROUPING SETS = 1 pass vs UNION ALL = N passes
  • FILTER: COUNT(*) FILTER (WHERE status = 'ACTIVE') β€” better than CASE WHEN
  • Incremental Sort (PG 13+): data is partially sorted β†’ sort the rest
  • Index on GROUP BY columns β†’ GroupAggregate without Sort
  • NULLs group together

Frequent follow-up questions:

  • β€œWhen is HashAggregate better than GroupAggregate?” β†’ Data is not sorted + fits in work_mem
  • β€œWhat happens if HashAggregate doesn’t fit in work_mem?” β†’ PG 13+ spills to disk, before PG 13 β€” error
  • β€œWhy is ROLLUP better than UNION ALL?” β†’ 1 pass instead of N
  • β€œHow to speed up GROUP BY on 200M rows?” β†’ work_mem ↑, index on GROUP BY, parallelism

Red flags (DO NOT say):

  • ❌ β€œGROUP BY always sorts” (HashAggregate doesn’t sort)
  • ❌ β€œspill to disk is normal” (10-100x slower!)
  • ❌ β€œUNION ALL for different slices is normal” (CUBE/GROUPING SETS is faster)

Related topics:

  • [[What is the difference between WHERE and HAVING]] β†’ filtering before/after GROUP BY
  • [[When to use HAVING]] β†’ filtering by aggregates
  • [[What are Window Functions]] β†’ alternative for aggregation without grouping