What Are Window Functions?
4. FILTER for conditional aggregation 5. EXCLUDE for complex statistics (PG 11+) 6. Minimize the number of unique PARTITION BY 7. Incremental Sort (PG 13+) use with indexes 8. S...
π’ Junior Level
Window functions are functions that perform computations over a set of rows related to the current row, without collapsing the result into a single row (like GROUP BY does).
Main advantage: you get both the details of each row and aggregates β in a single query. With GROUP BY you would need a subquery or JOIN. With window functions: one query, one pass over the data.
Simple analogy: GROUP BY is when you put all apples into one pile and count them. A window function is when for each apple you write how many total apples are in its variety, but the apple itself stays in place.
Example:
-- For each order, show its amount and the user's average
SELECT
user_id,
amount,
AVG(amount) OVER(PARTITION BY user_id) as user_avg
FROM orders;
-- Result:
-- user_id | amount | user_avg
-- 1 | 100 | 150 β average for user 1
-- 1 | 200 | 150 β same average, row didn't collapse
-- 2 | 300 | 250 β average for user 2
-- 2 | 200 | 250
Main window functions:
ROW_NUMBER()β row numberRANK(),DENSE_RANK()β rankSUM() OVER(),AVG() OVER()β aggregates without GROUP BYLAG(),LEAD()β access to previous/next row
π‘ Middle Level
Anatomy of a Window Function
FUNCTION() OVER (
PARTITION BY column1, column2 -- Division into groups (like GROUP BY)
ORDER BY column3 -- Sorting within group
ROWS BETWEEN ... AND ... -- Frame boundaries
)
Key differences from GROUP BY:
| GROUP BY | Window Functions |
|---|---|
| Collapses rows | Preserves rows |
| One row per group | Each row + computation |
| Cannot mix details and aggregate | You can! |
Types of Window Functions
1. Aggregate:
SELECT
name,
salary,
department,
AVG(salary) OVER(PARTITION BY department) as dept_avg,
SUM(salary) OVER(PARTITION BY department) as dept_total,
COUNT(*) OVER(PARTITION BY department) as dept_count
FROM employees;
2. Ranking:
SELECT
name,
salary,
ROW_NUMBER() OVER(ORDER BY salary DESC) as row_num,
RANK() OVER(ORDER BY salary DESC) as rank,
DENSE_RANK() OVER(ORDER BY salary DESC) as dense_rank
FROM employees;
3. Offset:
-- Comparison with previous row
SELECT
month,
revenue,
LAG(revenue, 1) OVER(ORDER BY month) as prev_month,
LEAD(revenue, 1) OVER(ORDER BY month) as next_month,
revenue - LAG(revenue, 1) OVER(ORDER BY month) as growth
FROM monthly_sales;
ROWS vs RANGE vs GROUPS
-- ROWS: physical rows (fastest)
SELECT
date,
amount,
AVG(amount) OVER(ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as moving_avg_7d
FROM sales;
-- RANGE: logical values (slower, but more accurate)
SELECT
date,
amount,
SUM(amount) OVER(ORDER BY date RANGE BETWEEN INTERVAL '7 days' PRECEDING AND CURRENT ROW) as weekly_sum
FROM sales;
Key difference with an example: If the date column has duplicates (two orders on the same day), then ROWS BETWEEN 6 PRECEDING AND CURRENT ROW takes exactly 6 previous ROWS (ignoring dates), while RANGE BETWEEN ... 7 days PRECEDING takes all rows from the last 7 days (even if there are 20 of them). ROWS = 6 rows, RANGE = 7 days.
Common Mistakes
- Filtering by window result in the same query
-- β ERROR: windows are computed AFTER WHERE -- SQL execution order: -- FROM β WHERE β GROUP BY β HAVING β SELECT (including window functions) β ORDER BY β LIMIT -- Since WHERE executes BEFORE SELECT, it can't see the window function result. -- Therefore a wrapper is needed: inner query computes the window, outer β filters. SELECT * FROM ( SELECT user_id, amount, ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY created_at DESC) as rn FROM orders ) sub WHERE rn = 1; -- β Correct: in subquery - Mixing different PARTITION BY
-- β Expensive: two different sorts SELECT AVG(amount) OVER(PARTITION BY user_id) as user_avg, SUM(amount) OVER(PARTITION BY department) as dept_total FROM orders; -- β Two sort phases! - COUNT(*) OVER() on large data
-- β Forces reading ALL rows of the partition SELECT id, name, COUNT(*) OVER() as total FROM users LIMIT 10; -- β Better as separate query for total SELECT COUNT(*) FROM users; -- Once SELECT id, name FROM users LIMIT 10; -- Separately
Practical Examples
Running Total:
SELECT
date,
amount,
SUM(amount) OVER(ORDER BY date ROWS UNBOUNDED PRECEDING) as running_total
FROM transactions;
Percentage of Group:
SELECT
name,
salary,
department,
ROUND(100.0 * salary / SUM(salary) OVER(PARTITION BY department), 2) as pct_of_dept
FROM employees;
π΄ Senior Level
When NOT to Use Window Functions
- When you need a single aggregate (total sum, average) β use GROUP BY, itβs simpler and faster
- When you do
COUNT(*) OVER()only for pagination β a separateSELECT COUNT(*)is faster, as it doesnβt read all rows - When there are many different PARTITION BY in one query β each unique combination = a separate sort phase, which kills performance
Execution Order and Optimizations
SQL computation order:
1. FROM / JOIN
2. WHERE
3. GROUP BY
4. HAVING
5. SELECT / Window Functions β Windows are computed HERE
6. DISTINCT
7. ORDER BY
8. LIMIT
Important: Windows are computed AFTER WHERE, GROUP BY, and HAVING. Therefore, you cannot filter by a window result in the same query.
ROWS vs RANGE vs GROUPS: Deep Dive
ROWS:
-- Physical rows: 6 PRECEDING + CURRENT
AVG(amount) OVER(ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
-- Fast: simply takes N rows
-- Problem: if there are duplicates in ORDER BY β non-deterministic
RANGE:
-- Logical values: all rows with the same value
SUM(amount) OVER(ORDER BY amount RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
-- Slower: checks values, not just positions
-- Needed for correct handling of duplicates
GROUPS (PG 11+):
-- Groups of rows with the same value
AVG(amount) OVER(ORDER BY date GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING)
-- Groups "peer" rows together
On PG 10 and below, GROUPS is not available. Alternative: subquery with RANGE or manual filtering.
EXCLUDE (PG 11+)
-- Excluding the current row from calculation
SELECT
name,
salary,
AVG(salary) OVER(
PARTITION BY department
ORDER BY salary
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
EXCLUDE CURRENT ROW
) as others_avg
FROM employees;
EXCLUDE options:
EXCLUDE CURRENT ROWβ without the current rowEXCLUDE GROUPβ without current + peer rowsEXCLUDE TIESβ without peers, but with currentEXCLUDE NO OTHERSβ default
On PG 10 and below, EXCLUDE is not available. Alternative: CASE WHEN to exclude the current row or a subquery.
Incremental Sort (PG 13+)
-- Index on (department, created_at)
CREATE INDEX idx_orders_dept_date ON orders(department, created_at);
-- Query with PARTITION BY + ORDER BY
SELECT
department,
created_at,
amount,
SUM(amount) OVER(PARTITION BY department ORDER BY created_at) as running_total
FROM orders;
-- PG 13+: Incremental Sort
-- Data is already sorted by department (from index)
-- β Sort only by created_at within each department
-- β 5-10x faster than full sort!
-- Incremental Sort works because data is already sorted by the first
-- part of the key (from the index). Instead of full sort by (department, created_at)
-- the database sorts only by created_at within each department group.
-- If there are 10 groups, each sort is 10x smaller.
-- On PG 12 and below β full sort of all data, even if part is already sorted.
Cost of Window Frame
-- Cheap frame: from start to current row (default)
SUM(amount) OVER(ORDER BY date ROWS UNBOUNDED PRECEDING)
-- β Incremental computation, the database doesn't "subtract" expired rows
-- Expensive frame: sliding window
AVG(amount) OVER(ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
-- β For each row, a window of 7 elements is recomputed
-- β Requires state management (adding new, removing old)
Named Windows (WINDOW clause)
-- DRY for window functions
SELECT
department,
AVG(salary) OVER w as avg_sal,
SUM(salary) OVER w as total_sal,
COUNT(*) OVER w as emp_count
FROM employees
WINDOW w AS (PARTITION BY department ORDER BY hire_date);
-- β One window definition for all functions
-- β Fewer sorts, faster execution
FILTER with Window Functions
SELECT
department,
salary,
SUM(salary) FILTER (WHERE status = 'ACTIVE')
OVER(PARTITION BY department) as active_payroll
FROM employees;
Edge Cases
- Empty window
-- Without PARTITION BY and ORDER BY SELECT name, salary, SUM(salary) OVER() as total_payroll -- Sum over the entire table FROM employees; - NULL in ORDER BY
-- NULL defaults to LAST in DESC, FIRST in ASC SELECT name, salary, RANK() OVER(ORDER BY salary DESC NULLS LAST) as rank FROM employees; - Nested windows β NOT ALLOWED
-- β ERROR: window functions cannot be nested SELECT AVG(SUM(amount) OVER(PARTITION BY user_id)) OVER() FROM orders; -- β Solution: CTE WITH user_totals AS ( SELECT user_id, SUM(amount) as total FROM orders GROUP BY user_id ) SELECT AVG(total) FROM user_totals;
Performance Tuning
Minimizing unique windows:
-- β Expensive: 3 different PARTITION BY β 3 sorts
SELECT
AVG(amount) OVER(PARTITION BY user_id) as u_avg,
AVG(amount) OVER(PARTITION BY department) as d_avg,
AVG(amount) OVER(PARTITION BY region) as r_avg
FROM orders;
-- β
Better: same PARTITION BY β 1 sort
SELECT
AVG(amount) OVER(PARTITION BY user_id) as u_avg,
SUM(amount) OVER(PARTITION BY user_id) as u_total,
COUNT(*) OVER(PARTITION BY user_id) as u_count
FROM orders;
Index for WindowAgg:
-- Index on (PARTITION BY columns, ORDER BY columns)
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);
-- Query without Sort
SELECT user_id, created_at, amount,
SUM(amount) OVER(PARTITION BY user_id ORDER BY created_at) as running_total
FROM orders;
-- Plan: WindowAgg β Index Scan (no Sort!)
Production Experience
Real scenario #1: Running Total for Report
- Fintech: daily balance for 1M transactions
- Query: 30 seconds (Sort on disk)
- Solution:
- Index on
(account_id, date) ROWS UNBOUNDED PRECEDING(cheap frame)- work_mem = 256MB
- Index on
- Result: 2 seconds
Real scenario #2: Mixed windows kill performance
- Analytics: 5 window functions with different PARTITION BY
- Time: 2 minutes (5 sorts!)
- Solution: split into 2 CTEs
- Result: 15 seconds
Monitoring
-- 1. Check plan with window functions
EXPLAIN (ANALYZE, BUFFERS)
SELECT ... OVER(PARTITION BY ... ORDER BY ...);
-- Look for: WindowAgg β Sort (or Index Scan if index exists)
-- 2. Check memory for Sort
EXPLAIN (ANALYZE)
SELECT ... OVER(...);
-- "external merge Disk: X kB" β spill! Increase work_mem
-- 3. Find slow queries with windows
SELECT query, mean_exec_time
FROM pg_stat_statements
WHERE query LIKE '%OVER(%'
ORDER BY mean_exec_time DESC
LIMIT 10;
Best Practices
- Index on
(PARTITION BY, ORDER BY)β no Sort - ROWS instead of RANGE where possible (faster)
- WINDOW clause for reusing window definitions
- FILTER for conditional aggregation
- EXCLUDE for complex statistics (PG 11+)
- Minimize the number of unique PARTITION BY
- Incremental Sort (PG 13+) use with indexes
- Subquery/CTE for filtering by window result
Summary for Senior
- Window functions are computed AFTER WHERE, GROUP BY, HAVING
- ROWS is faster than RANGE, but RANGE is more correct with duplicates
- Incremental Sort (PG 13+) saves on partial sorting
- EXCLUDE (PG 11+) for complex statistics
- FILTER is supported in window functions
- Frame cost:
UNBOUNDED PRECEDINGis cheaper than sliding window - Minimize unique PARTITION BY β fewer Sort phases
- Index on
(PARTITION BY, ORDER BY)β WindowAgg without Sort
π― Interview Cheat Sheet
Must know:
- Window functions: compute over a group of rows, do NOT collapse (unlike GROUP BY)
- Syntax:
FUNCTION() OVER(PARTITION BY ... ORDER BY ... ROWS/RANGE BETWEEN ...) - 3 types: aggregate (SUM, AVG), ranking (ROW_NUMBER, RANK), offset (LAG, LEAD)
- ROWS = physical rows (faster), RANGE = logical values (more correct with duplicates)
- GROUPS (PG 11+): groups βpeerβ rows with the same value
- EXCLUDE (PG 11+):
EXCLUDE CURRENT ROW,EXCLUDE GROUP,EXCLUDE TIES - Filtering by window result β ONLY in subquery/CTE (windows are computed AFTER WHERE)
- Index on
(PARTITION BY, ORDER BY)β WindowAgg without Sort - Incremental Sort (PG 13+): data is partially sorted β sort the rest
- Named windows (WINDOW clause): DRY for window functions
- Window functions cannot be nested β CTE
Frequent follow-up questions:
- βWhy canβt you filter by window function in WHERE?β β Windows are computed AFTER WHERE
- βROWS vs RANGE β which to choose?β β ROWS is faster, RANGE is more correct with duplicates
- βHow to speed up a window function?β β Index on
(PARTITION BY, ORDER BY) - βWhat is Incremental Sort?β β PG 13+: sorts only the missing part
Red flags (DO NOT say):
- β βWindow functions collapse rowsβ (no, thatβs GROUP BY)
- β βYou can filter by window function in WHEREβ (no, computed after WHERE)
- β βRANGE is always better than ROWSβ (ROWS is faster, RANGE only with duplicates)
Related topics:
- [[What does ROW_NUMBER() do]] β unique numbering
- [[What do RANK() and DENSE_RANK() do]] β ranking
- [[What does GROUP BY do]] β difference from window functions
- [[What is the difference between WHERE and HAVING]] β execution order