What does RANK() and DENSE_RANK() do?
4. PERCENT_RANK() for percentiles (top 5%) 5. NULLS LAST/FIRST — specify explicitly 6. Index on (PARTITION BY, ORDER BY) → no Sort needed 7. Determinism: add a unique column to...
🟢 Junior Level
RANK() and DENSE_RANK() — window functions that assign a rank (position) to rows based on their values, equal values receive the same rank.
Simple analogy: Sports competitions:
- RANK() — Olympic system: two golds → next is bronze (skips a position)
- DENSE_RANK() — grading system: two golds → next is silver (no skips)
Example:
-- Salaries: 100, 100, 80, 60
SELECT name, salary,
RANK() OVER(ORDER BY salary DESC) as rank,
DENSE_RANK() OVER(ORDER BY salary DESC) as dense_rank
FROM employees;
-- Result:
-- name | salary | rank | dense_rank
-- Ivan | 100 | 1 | 1 ← same salaries
-- Petr | 100 | 1 | 1 ← same rank
-- Anna | 80 | 3 | 2 ← RANK skips 2
-- Sergey | 60 | 4 | 3
When to use:
- RANK() — when “distance” matters (how many people are ahead of you)
- DENSE_RANK() — when “level” matters without skips
- ROW_NUMBER() — when unique numbering is needed
🟡 Middle Level
Comparison of ranking functions
-- Values: 100, 100, 100, 80, 60
SELECT value,
ROW_NUMBER() OVER(ORDER BY value DESC) as row_num,
RANK() OVER(ORDER BY value DESC) as rank,
DENSE_RANK() OVER(ORDER BY value DESC) as dense_rank
FROM data;
-- Result:
-- value | row_num | rank | dense_rank
-- 100 | 1 | 1 | 1
-- 100 | 2 | 1 | 1 ← all three 100s = rank 1
-- 100 | 3 | 1 | 1
-- 80 | 4 | 4 | 2 ← RANK: 4, DENSE: 2
-- 60 | 5 | 5 | 3
| Function | Equal values | Skips | Uniqueness |
|---|---|---|---|
| ROW_NUMBER() | Different numbers | No | ✅ Always |
| RANK() | Same rank | ✅ Yes | ❌ No |
| DENSE_RANK() | Same rank | ❌ No | ❌ No |
When to choose which
RANK() — when “honest positions” matter:
-- Top 10 players (returns everyone tied for 10th place)
SELECT * FROM (
SELECT name, score, RANK() OVER(ORDER BY score DESC) as r
FROM players
) t WHERE r <= 10;
-- If 3 players are tied for 10th → returns 12 rows (fair!)
DENSE_RANK() — when “levels” matter without skips:
-- Salary levels without skips
SELECT name, salary,
DENSE_RANK() OVER(ORDER BY salary DESC) as salary_level
FROM employees;
-- Level 1, 2, 3... without skips
-- Convenient for reports: "How many salary levels exist?"
SELECT COUNT(DISTINCT salary_level) FROM (...);
Analytic ranking functions
PERCENT_RANK() — relative rank (0.0 to 1.0):
SELECT name, salary,
ROUND(PERCENT_RANK() OVER(ORDER BY salary DESC)::numeric, 2) as pct
FROM employees;
-- pct = 0.00 → top (best)
-- pct = 1.00 → worst
-- pct = 0.50 → median
-- Formula: `(rank - 1) / (total_rows - 1)`
-- First: (1-1)/(N-1) = 0.00. Last: (N-1)/(N-1) = 1.00
NTILE(n) — dividing into groups:
-- Dividing into 4 quartiles (25% in each)
SELECT name, revenue,
NTILE(4) OVER(ORDER BY revenue DESC) as quartile
FROM salespeople;
-- quartile = 1 → top 25%
-- quartile = 4 → bottom 25%
CUME_DIST() — cumulative distribution:
-- What percentage of employees earn less?
SELECT name, salary,
ROUND(CUME_DIST() OVER(ORDER BY salary)::numeric, 2) as pct_below
FROM employees;
Handling NULL
-- NULLs are considered "the largest" by default
SELECT name, salary,
RANK() OVER(ORDER BY salary DESC) as rank
FROM employees;
-- NULLs will get rank = 1!
-- Solution: NULLS LAST
SELECT name, salary,
RANK() OVER(ORDER BY salary DESC NULLS LAST) as rank
FROM employees;
Common mistakes
- Forgot NULLS LAST
-- ❌ NULLs will be first in DESC RANK() OVER(ORDER BY score DESC) -- ✅ NULLs at the end RANK() OVER(ORDER BY score DESC NULLS LAST) - LIMIT instead of RANK for Top-N
-- ⚠️ LIMIT may cut off tied values (safe when values are UNIQUE -- or ties don't matter, e.g. "show any 10 for a preview") SELECT * FROM players ORDER BY score DESC LIMIT 10; -- If 10th and 11th have the same score → one is randomly cut off -- ✅ RANK returns all tied values SELECT * FROM ( SELECT *, RANK() OVER(ORDER BY score DESC) as r FROM players ) t WHERE r <= 10; - Nondeterminism
-- ❌ Equal scores → order not guaranteed RANK() OVER(ORDER BY score DESC) -- ✅ Add a unique column RANK() OVER(ORDER BY score DESC, id DESC)
When NOT to use RANK/DENSE_RANK
- When unique numbering is needed — ROW_NUMBER (2x faster)
- When ties don’t matter — simple LIMIT is enough
- For hundreds of millions of rows where speed matters — ROW_NUMBER is faster
🔴 Senior Level
Execution physics and performance
Function costs:
ROW_NUMBER(): O(n) — simply increments a counter
RANK(): O(n log n) — comparison with previous row
(Peer check: if values are equal — "peer" = "equal" —
RANK assigns the same rank)
DENSE_RANK(): O(n log n) — peer check + tracking UNIQUE values
DENSE_RANK is more expensive than ROW_NUMBER:
- Needs to compare the current value with the previous one
- When the value changes → increment the rank
- On 100M rows: ROW_NUMBER is 2x faster
Indexes and Sort elimination
-- Index on (partition_col, order_col DESC)
CREATE INDEX idx_players_score ON players(score DESC, id DESC);
-- Query without Sort
SELECT name, score,
RANK() OVER(ORDER BY score DESC, id DESC) as rank
FROM players;
-- Plan: WindowAgg → Index Scan (NO Sort!)
-- → 10-50x faster on large tables
Top-N with RANK: fair sampling
Scenario: Contest with prize positions
-- ❌ LIMIT is unfair with equal scores
SELECT * FROM contest ORDER BY score DESC LIMIT 10;
-- If 10th and 11th scored the same → one doesn't get the prize
-- ✅ RANK guarantees fairness
SELECT * FROM (
SELECT name, score,
RANK() OVER(ORDER BY score DESC) as rank
FROM contest
) t WHERE rank <= 10;
-- Returns EVERYONE with rank <= 10, even if there are 15
NTILE for segmentation
-- Dividing customers into 10 groups by spending (deciles)
SELECT
customer_id,
total_spent,
NTILE(10) OVER(ORDER BY total_spent DESC) as decile
FROM customers;
-- decile = 1 → top 10% of customers
-- decile = 10 → bottom 10%
-- Analysis by quartiles
SELECT
quartile,
COUNT(*) as customers,
AVG(total_spent) as avg_spent,
MIN(total_spent) as min_spent,
MAX(total_spent) as max_spent
FROM (
SELECT customer_id, total_spent,
NTILE(4) OVER(ORDER BY total_spent DESC) as quartile
FROM customers
) t
GROUP BY quartile
ORDER BY quartile;
PERCENT_RANK for percentiles
-- Find employees in the top 5% by salary
SELECT * FROM (
SELECT name, salary,
PERCENT_RANK() OVER(ORDER BY salary DESC) as pct
FROM employees
) t
WHERE pct <= 0.05;
Edge Cases
- Empty result
-- If the table is empty → 0 rows SELECT name, RANK() OVER(ORDER BY salary DESC) FROM empty_table; -- Result: 0 rows - PARTITION BY with RANK
-- Rank within each group SELECT department, name, salary, RANK() OVER(PARTITION BY department ORDER BY salary DESC) as dept_rank FROM employees; -- Each department has its own ranks 1, 2, 3... - RANK with multiple ORDER BY columns
-- Ranking by multiple columns SELECT name, score, time, RANK() OVER(ORDER BY score DESC, time ASC) as rank FROM competition; -- First by score (DESC), then by time (ASC)
Performance Tuning
Memory for Sort:
-- RANK/DENSE_RANK require sorting
-- If Sort doesn't fit in work_mem → spill to disk
EXPLAIN (ANALYZE) SELECT ... RANK() OVER(ORDER BY ...);
-- "Disk: X kB" → increase work_mem
Parallelism:
-- WindowAgg does NOT parallelize (up to PG 14)
-- PG 14+: Parallel Gather before WindowAgg
-- WindowAgg itself — runs in a single process
-- For very large tables:
-- Partitioning + UNION ALL
Production Experience
Real scenario #1: Seller rating
- E-commerce: monthly rating of 50,000 sellers
- Query: 10 seconds (Sort on disk)
- Solution:
- Index on
(month, revenue DESC) - work_mem = 256MB
- Index on
- Result: 500ms
Real scenario #2: NTILE for A/B test
- Marketing: dividing users into 4 groups by activity
- Query with NTILE(4): 2 seconds
- Usage: targeted campaigns by quartile
- Result: conversion +15% thanks to precise segmentation
Monitoring
-- 1. Check the execution plan
EXPLAIN (ANALYZE, BUFFERS)
SELECT ... RANK() OVER(...);
-- WindowAgg → Sort → optimize with an index
-- WindowAgg → Index Scan → great!
-- 2. Check for spill
EXPLAIN (ANALYZE)
SELECT ... RANK() OVER(ORDER BY ...);
-- "Disk: X kB" → spill! Increase work_mem
-- 3. Find queries with ranking
SELECT query, mean_exec_time
FROM pg_stat_statements
WHERE query LIKE '%RANK()%' OR query LIKE '%DENSE_RANK()%'
ORDER BY mean_exec_time DESC;
Best Practices
- RANK() for fair Top-N (all tied values are included)
- DENSE_RANK() for levels/grades without skips
- NTILE(n) for segmentation into equal groups
- PERCENT_RANK() for percentiles (top 5%)
- NULLS LAST/FIRST — specify explicitly
- Index on
(PARTITION BY, ORDER BY)→ no Sort needed - Determinism: add a unique column to ORDER BY
- LIMIT vs RANK: RANK for fair sampling, LIMIT for fixed counts
Summary for Senior
- RANK() = Olympic system (with skips)
- DENSE_RANK() = grading system (no skips)
- ROW_NUMBER() = unique numbering (arbitrary for ties)
- NTILE(n) = dividing into n equal groups
- PERCENT_RANK() = relative rank (0.0-1.0)
- RANK > LIMIT for fair Top-N with equal values
- Index on
(PARTITION BY, ORDER BY)→ WindowAgg without Sort - NULLS LAST/FIRST for correct ranking with NULLs
🎯 Interview Cheat Sheet
Must know:
- RANK(): equal values = same rank, next rank with a skip (1, 1, 3)
- DENSE_RANK(): equal values = same rank, no skips (1, 1, 2)
- ROW_NUMBER(): always unique numbering (1, 2, 3) — arbitrary for ties
- NTILE(n): divides into n equal groups (quartiles, deciles)
- PERCENT_RANK(): relative rank 0.0-1.0, 0.0 = top
- CUME_DIST(): cumulative distribution (what percentage is below)
- RANK > LIMIT for “fair” Top-N (returns ALL with equal values)
- NULLS LAST/FIRST: NULL defaults to FIRST in ASC, LAST in DESC
- Index on
(PARTITION BY, ORDER BY)→ WindowAgg without Sort - RANK/DENSE_RANK more expensive than ROW_NUMBER (peer check → O(n log n))
Frequent follow-up questions:
- “When is RANK better than LIMIT?” → With equal values (contests, ratings)
- “Why is NULLS LAST needed?” → Without it, NULLs get rank = 1 in DESC
- “Why is NTILE more useful than GROUP BY?” → Divides into equal groups for segmentation
- “Why is RANK more expensive than ROW_NUMBER?” → Peer check + tracking unique values
Red flags (do NOT say):
- ❌ “RANK and ROW_NUMBER are the same” (RANK has skips, ROW_NUMBER is unique)
- ❌ “LIMIT for Top-N is always fair” (randomly cuts off ties)
- ❌ “NULLs don’t affect ranking” (NULL defaults to FIRST in DESC!)
Related topics:
- [[What does ROW_NUMBER() do]] → differences from RANK
- [[What are window functions]] → general overview
- [[What does GROUP BY do]] → NTILE as an alternative to grouping