Question 14 Β· Section 1

What Are Window Functions?

4. FILTER for conditional aggregation 5. EXCLUDE for complex statistics (PG 11+) 6. Minimize the number of unique PARTITION BY 7. Incremental Sort (PG 13+) use with indexes 8. S...

Language versions: English Russian Ukrainian

🟒 Junior Level

Window functions are functions that perform computations over a set of rows related to the current row, without collapsing the result into a single row (like GROUP BY does).

Main advantage: you get both the details of each row and aggregates β€” in a single query. With GROUP BY you would need a subquery or JOIN. With window functions: one query, one pass over the data.

Simple analogy: GROUP BY is when you put all apples into one pile and count them. A window function is when for each apple you write how many total apples are in its variety, but the apple itself stays in place.

Example:

-- For each order, show its amount and the user's average
SELECT
    user_id,
    amount,
    AVG(amount) OVER(PARTITION BY user_id) as user_avg
FROM orders;

-- Result:
-- user_id | amount | user_avg
-- 1       | 100    | 150      ← average for user 1
-- 1       | 200    | 150      ← same average, row didn't collapse
-- 2       | 300    | 250      ← average for user 2
-- 2       | 200    | 250

Main window functions:

  • ROW_NUMBER() β€” row number
  • RANK(), DENSE_RANK() β€” rank
  • SUM() OVER(), AVG() OVER() β€” aggregates without GROUP BY
  • LAG(), LEAD() β€” access to previous/next row

🟑 Middle Level

Anatomy of a Window Function

FUNCTION() OVER (
    PARTITION BY column1, column2  -- Division into groups (like GROUP BY)
    ORDER BY column3               -- Sorting within group
    ROWS BETWEEN ... AND ...       -- Frame boundaries
)

Key differences from GROUP BY:

GROUP BY Window Functions
Collapses rows Preserves rows
One row per group Each row + computation
Cannot mix details and aggregate You can!

Types of Window Functions

1. Aggregate:

SELECT
    name,
    salary,
    department,
    AVG(salary) OVER(PARTITION BY department) as dept_avg,
    SUM(salary) OVER(PARTITION BY department) as dept_total,
    COUNT(*) OVER(PARTITION BY department) as dept_count
FROM employees;

2. Ranking:

SELECT
    name,
    salary,
    ROW_NUMBER() OVER(ORDER BY salary DESC) as row_num,
    RANK() OVER(ORDER BY salary DESC) as rank,
    DENSE_RANK() OVER(ORDER BY salary DESC) as dense_rank
FROM employees;

3. Offset:

-- Comparison with previous row
SELECT
    month,
    revenue,
    LAG(revenue, 1) OVER(ORDER BY month) as prev_month,
    LEAD(revenue, 1) OVER(ORDER BY month) as next_month,
    revenue - LAG(revenue, 1) OVER(ORDER BY month) as growth
FROM monthly_sales;

ROWS vs RANGE vs GROUPS

-- ROWS: physical rows (fastest)
SELECT
    date,
    amount,
    AVG(amount) OVER(ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as moving_avg_7d
FROM sales;

-- RANGE: logical values (slower, but more accurate)
SELECT
    date,
    amount,
    SUM(amount) OVER(ORDER BY date RANGE BETWEEN INTERVAL '7 days' PRECEDING AND CURRENT ROW) as weekly_sum
FROM sales;

Key difference with an example: If the date column has duplicates (two orders on the same day), then ROWS BETWEEN 6 PRECEDING AND CURRENT ROW takes exactly 6 previous ROWS (ignoring dates), while RANGE BETWEEN ... 7 days PRECEDING takes all rows from the last 7 days (even if there are 20 of them). ROWS = 6 rows, RANGE = 7 days.

Common Mistakes

  1. Filtering by window result in the same query
    -- ❌ ERROR: windows are computed AFTER WHERE
    -- SQL execution order:
    -- FROM β†’ WHERE β†’ GROUP BY β†’ HAVING β†’ SELECT (including window functions) β†’ ORDER BY β†’ LIMIT
    -- Since WHERE executes BEFORE SELECT, it can't see the window function result.
    -- Therefore a wrapper is needed: inner query computes the window, outer β€” filters.
    SELECT * FROM (
        SELECT user_id, amount,
               ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY created_at DESC) as rn
        FROM orders
    ) sub
    WHERE rn = 1;  -- βœ… Correct: in subquery
    
  2. Mixing different PARTITION BY
    -- ❌ Expensive: two different sorts
    SELECT
        AVG(amount) OVER(PARTITION BY user_id) as user_avg,
        SUM(amount) OVER(PARTITION BY department) as dept_total
    FROM orders;
    -- β†’ Two sort phases!
    
  3. COUNT(*) OVER() on large data
    -- ❌ Forces reading ALL rows of the partition
    SELECT id, name, COUNT(*) OVER() as total FROM users LIMIT 10;
    
    -- βœ… Better as separate query for total
    SELECT COUNT(*) FROM users;  -- Once
    SELECT id, name FROM users LIMIT 10;  -- Separately
    

Practical Examples

Running Total:

SELECT
    date,
    amount,
    SUM(amount) OVER(ORDER BY date ROWS UNBOUNDED PRECEDING) as running_total
FROM transactions;

Percentage of Group:

SELECT
    name,
    salary,
    department,
    ROUND(100.0 * salary / SUM(salary) OVER(PARTITION BY department), 2) as pct_of_dept
FROM employees;

πŸ”΄ Senior Level

When NOT to Use Window Functions

  1. When you need a single aggregate (total sum, average) β€” use GROUP BY, it’s simpler and faster
  2. When you do COUNT(*) OVER() only for pagination β€” a separate SELECT COUNT(*) is faster, as it doesn’t read all rows
  3. When there are many different PARTITION BY in one query β€” each unique combination = a separate sort phase, which kills performance

Execution Order and Optimizations

SQL computation order:

1. FROM / JOIN
2. WHERE
3. GROUP BY
4. HAVING
5. SELECT / Window Functions  ← Windows are computed HERE
6. DISTINCT
7. ORDER BY
8. LIMIT

Important: Windows are computed AFTER WHERE, GROUP BY, and HAVING. Therefore, you cannot filter by a window result in the same query.

ROWS vs RANGE vs GROUPS: Deep Dive

ROWS:

-- Physical rows: 6 PRECEDING + CURRENT
AVG(amount) OVER(ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)

-- Fast: simply takes N rows
-- Problem: if there are duplicates in ORDER BY β†’ non-deterministic

RANGE:

-- Logical values: all rows with the same value
SUM(amount) OVER(ORDER BY amount RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

-- Slower: checks values, not just positions
-- Needed for correct handling of duplicates

GROUPS (PG 11+):

-- Groups of rows with the same value
AVG(amount) OVER(ORDER BY date GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING)

-- Groups "peer" rows together

On PG 10 and below, GROUPS is not available. Alternative: subquery with RANGE or manual filtering.

EXCLUDE (PG 11+)

-- Excluding the current row from calculation
SELECT
    name,
    salary,
    AVG(salary) OVER(
        PARTITION BY department
        ORDER BY salary
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
        EXCLUDE CURRENT ROW
    ) as others_avg
FROM employees;

EXCLUDE options:

  • EXCLUDE CURRENT ROW β€” without the current row
  • EXCLUDE GROUP β€” without current + peer rows
  • EXCLUDE TIES β€” without peers, but with current
  • EXCLUDE NO OTHERS β€” default

On PG 10 and below, EXCLUDE is not available. Alternative: CASE WHEN to exclude the current row or a subquery.

Incremental Sort (PG 13+)

-- Index on (department, created_at)
CREATE INDEX idx_orders_dept_date ON orders(department, created_at);

-- Query with PARTITION BY + ORDER BY
SELECT
    department,
    created_at,
    amount,
    SUM(amount) OVER(PARTITION BY department ORDER BY created_at) as running_total
FROM orders;

-- PG 13+: Incremental Sort
-- Data is already sorted by department (from index)
-- β†’ Sort only by created_at within each department
-- β†’ 5-10x faster than full sort!
-- Incremental Sort works because data is already sorted by the first
-- part of the key (from the index). Instead of full sort by (department, created_at)
-- the database sorts only by created_at within each department group.
-- If there are 10 groups, each sort is 10x smaller.
-- On PG 12 and below β€” full sort of all data, even if part is already sorted.

Cost of Window Frame

-- Cheap frame: from start to current row (default)
SUM(amount) OVER(ORDER BY date ROWS UNBOUNDED PRECEDING)
-- β†’ Incremental computation, the database doesn't "subtract" expired rows

-- Expensive frame: sliding window
AVG(amount) OVER(ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
-- β†’ For each row, a window of 7 elements is recomputed
-- β†’ Requires state management (adding new, removing old)

Named Windows (WINDOW clause)

-- DRY for window functions
SELECT
    department,
    AVG(salary) OVER w as avg_sal,
    SUM(salary) OVER w as total_sal,
    COUNT(*) OVER w as emp_count
FROM employees
WINDOW w AS (PARTITION BY department ORDER BY hire_date);

-- β†’ One window definition for all functions
-- β†’ Fewer sorts, faster execution

FILTER with Window Functions

SELECT
    department,
    salary,
    SUM(salary) FILTER (WHERE status = 'ACTIVE')
        OVER(PARTITION BY department) as active_payroll
FROM employees;

Edge Cases

  1. Empty window
    -- Without PARTITION BY and ORDER BY
    SELECT name, salary,
           SUM(salary) OVER() as total_payroll  -- Sum over the entire table
    FROM employees;
    
  2. NULL in ORDER BY
    -- NULL defaults to LAST in DESC, FIRST in ASC
    SELECT name, salary,
           RANK() OVER(ORDER BY salary DESC NULLS LAST) as rank
    FROM employees;
    
  3. Nested windows β€” NOT ALLOWED
    -- ❌ ERROR: window functions cannot be nested
    SELECT AVG(SUM(amount) OVER(PARTITION BY user_id)) OVER() FROM orders;
    
    -- βœ… Solution: CTE
    WITH user_totals AS (
        SELECT user_id, SUM(amount) as total
        FROM orders GROUP BY user_id
    )
    SELECT AVG(total) FROM user_totals;
    

Performance Tuning

Minimizing unique windows:

-- ❌ Expensive: 3 different PARTITION BY β†’ 3 sorts
SELECT
    AVG(amount) OVER(PARTITION BY user_id) as u_avg,
    AVG(amount) OVER(PARTITION BY department) as d_avg,
    AVG(amount) OVER(PARTITION BY region) as r_avg
FROM orders;

-- βœ… Better: same PARTITION BY β†’ 1 sort
SELECT
    AVG(amount) OVER(PARTITION BY user_id) as u_avg,
    SUM(amount) OVER(PARTITION BY user_id) as u_total,
    COUNT(*) OVER(PARTITION BY user_id) as u_count
FROM orders;

Index for WindowAgg:

-- Index on (PARTITION BY columns, ORDER BY columns)
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);

-- Query without Sort
SELECT user_id, created_at, amount,
       SUM(amount) OVER(PARTITION BY user_id ORDER BY created_at) as running_total
FROM orders;
-- Plan: WindowAgg β†’ Index Scan (no Sort!)

Production Experience

Real scenario #1: Running Total for Report

  • Fintech: daily balance for 1M transactions
  • Query: 30 seconds (Sort on disk)
  • Solution:
    • Index on (account_id, date)
    • ROWS UNBOUNDED PRECEDING (cheap frame)
    • work_mem = 256MB
  • Result: 2 seconds

Real scenario #2: Mixed windows kill performance

  • Analytics: 5 window functions with different PARTITION BY
  • Time: 2 minutes (5 sorts!)
  • Solution: split into 2 CTEs
  • Result: 15 seconds

Monitoring

-- 1. Check plan with window functions
EXPLAIN (ANALYZE, BUFFERS)
SELECT ... OVER(PARTITION BY ... ORDER BY ...);
-- Look for: WindowAgg β†’ Sort (or Index Scan if index exists)

-- 2. Check memory for Sort
EXPLAIN (ANALYZE)
SELECT ... OVER(...);
-- "external merge Disk: X kB" β†’ spill! Increase work_mem

-- 3. Find slow queries with windows
SELECT query, mean_exec_time
FROM pg_stat_statements
WHERE query LIKE '%OVER(%'
ORDER BY mean_exec_time DESC
LIMIT 10;

Best Practices

  1. Index on (PARTITION BY, ORDER BY) β†’ no Sort
  2. ROWS instead of RANGE where possible (faster)
  3. WINDOW clause for reusing window definitions
  4. FILTER for conditional aggregation
  5. EXCLUDE for complex statistics (PG 11+)
  6. Minimize the number of unique PARTITION BY
  7. Incremental Sort (PG 13+) use with indexes
  8. Subquery/CTE for filtering by window result

Summary for Senior

  • Window functions are computed AFTER WHERE, GROUP BY, HAVING
  • ROWS is faster than RANGE, but RANGE is more correct with duplicates
  • Incremental Sort (PG 13+) saves on partial sorting
  • EXCLUDE (PG 11+) for complex statistics
  • FILTER is supported in window functions
  • Frame cost: UNBOUNDED PRECEDING is cheaper than sliding window
  • Minimize unique PARTITION BY β†’ fewer Sort phases
  • Index on (PARTITION BY, ORDER BY) β†’ WindowAgg without Sort

🎯 Interview Cheat Sheet

Must know:

  • Window functions: compute over a group of rows, do NOT collapse (unlike GROUP BY)
  • Syntax: FUNCTION() OVER(PARTITION BY ... ORDER BY ... ROWS/RANGE BETWEEN ...)
  • 3 types: aggregate (SUM, AVG), ranking (ROW_NUMBER, RANK), offset (LAG, LEAD)
  • ROWS = physical rows (faster), RANGE = logical values (more correct with duplicates)
  • GROUPS (PG 11+): groups β€œpeer” rows with the same value
  • EXCLUDE (PG 11+): EXCLUDE CURRENT ROW, EXCLUDE GROUP, EXCLUDE TIES
  • Filtering by window result β†’ ONLY in subquery/CTE (windows are computed AFTER WHERE)
  • Index on (PARTITION BY, ORDER BY) β†’ WindowAgg without Sort
  • Incremental Sort (PG 13+): data is partially sorted β†’ sort the rest
  • Named windows (WINDOW clause): DRY for window functions
  • Window functions cannot be nested β†’ CTE

Frequent follow-up questions:

  • β€œWhy can’t you filter by window function in WHERE?” β†’ Windows are computed AFTER WHERE
  • β€œROWS vs RANGE β€” which to choose?” β†’ ROWS is faster, RANGE is more correct with duplicates
  • β€œHow to speed up a window function?” β†’ Index on (PARTITION BY, ORDER BY)
  • β€œWhat is Incremental Sort?” β†’ PG 13+: sorts only the missing part

Red flags (DO NOT say):

  • ❌ β€œWindow functions collapse rows” (no, that’s GROUP BY)
  • ❌ β€œYou can filter by window function in WHERE” (no, computed after WHERE)
  • ❌ β€œRANGE is always better than ROWS” (ROWS is faster, RANGE only with duplicates)

Related topics:

  • [[What does ROW_NUMBER() do]] β†’ unique numbering
  • [[What do RANK() and DENSE_RANK() do]] β†’ ranking
  • [[What does GROUP BY do]] β†’ difference from window functions
  • [[What is the difference between WHERE and HAVING]] β†’ execution order