What does RANK() and DENSE_RANK() do?

🟢 Junior Level

RANK() and DENSE_RANK() — window functions that assign a rank (position) to rows based on their values, equal values receive the same rank.

Simple analogy: Sports competitions:

RANK() — Olympic system: two golds → next is bronze (skips a position)
DENSE_RANK() — grading system: two golds → next is silver (no skips)

Example:

-- Salaries: 100, 100, 80, 60

SELECT name, salary,
       RANK() OVER(ORDER BY salary DESC) as rank,
       DENSE_RANK() OVER(ORDER BY salary DESC) as dense_rank
FROM employees;

-- Result:
-- name   | salary | rank | dense_rank
-- Ivan   | 100    | 1    | 1        ← same salaries
-- Petr   | 100    | 1    | 1        ← same rank
-- Anna   | 80     | 3    | 2        ← RANK skips 2
-- Sergey | 60     | 4    | 3

When to use:

RANK() — when “distance” matters (how many people are ahead of you)
DENSE_RANK() — when “level” matters without skips
ROW_NUMBER() — when unique numbering is needed

🟡 Middle Level

Comparison of ranking functions

-- Values: 100, 100, 100, 80, 60

SELECT value,
       ROW_NUMBER() OVER(ORDER BY value DESC) as row_num,
       RANK() OVER(ORDER BY value DESC) as rank,
       DENSE_RANK() OVER(ORDER BY value DESC) as dense_rank
FROM data;

-- Result:
-- value | row_num | rank | dense_rank
-- 100   | 1       | 1    | 1
-- 100   | 2       | 1    | 1        ← all three 100s = rank 1
-- 100   | 3       | 1    | 1
-- 80    | 4       | 4    | 2        ← RANK: 4, DENSE: 2
-- 60    | 5       | 5    | 3

Function	Equal values	Skips	Uniqueness
ROW_NUMBER()	Different numbers	No	✅ Always
RANK()	Same rank	✅ Yes	❌ No
DENSE_RANK()	Same rank	❌ No	❌ No

When to choose which

RANK() — when “honest positions” matter:

-- Top 10 players (returns everyone tied for 10th place)
SELECT * FROM (
    SELECT name, score, RANK() OVER(ORDER BY score DESC) as r
    FROM players
) t WHERE r <= 10;
-- If 3 players are tied for 10th → returns 12 rows (fair!)

DENSE_RANK() — when “levels” matter without skips:

-- Salary levels without skips
SELECT name, salary,
       DENSE_RANK() OVER(ORDER BY salary DESC) as salary_level
FROM employees;

-- Level 1, 2, 3... without skips
-- Convenient for reports: "How many salary levels exist?"
SELECT COUNT(DISTINCT salary_level) FROM (...);

Analytic ranking functions

PERCENT_RANK() — relative rank (0.0 to 1.0):

SELECT name, salary,
       ROUND(PERCENT_RANK() OVER(ORDER BY salary DESC)::numeric, 2) as pct
FROM employees;

-- pct = 0.00 → top (best)
-- pct = 1.00 → worst
-- pct = 0.50 → median
-- Formula: `(rank - 1) / (total_rows - 1)`
-- First: (1-1)/(N-1) = 0.00. Last: (N-1)/(N-1) = 1.00

NTILE(n) — dividing into groups:

-- Dividing into 4 quartiles (25% in each)
SELECT name, revenue,
       NTILE(4) OVER(ORDER BY revenue DESC) as quartile
FROM salespeople;

-- quartile = 1 → top 25%
-- quartile = 4 → bottom 25%

CUME_DIST() — cumulative distribution:

-- What percentage of employees earn less?
SELECT name, salary,
       ROUND(CUME_DIST() OVER(ORDER BY salary)::numeric, 2) as pct_below
FROM employees;

Handling NULL

-- NULLs are considered "the largest" by default
SELECT name, salary,
       RANK() OVER(ORDER BY salary DESC) as rank
FROM employees;

-- NULLs will get rank = 1!
-- Solution: NULLS LAST
SELECT name, salary,
       RANK() OVER(ORDER BY salary DESC NULLS LAST) as rank
FROM employees;

Common mistakes

Forgot NULLS LAST

-- ❌ NULLs will be first in DESC
RANK() OVER(ORDER BY score DESC)

-- ✅ NULLs at the end
RANK() OVER(ORDER BY score DESC NULLS LAST)

LIMIT instead of RANK for Top-N

-- ⚠️ LIMIT may cut off tied values (safe when values are UNIQUE
-- or ties don't matter, e.g. "show any 10 for a preview")
SELECT * FROM players ORDER BY score DESC LIMIT 10;
-- If 10th and 11th have the same score → one is randomly cut off

-- ✅ RANK returns all tied values
SELECT * FROM (
    SELECT *, RANK() OVER(ORDER BY score DESC) as r
    FROM players
) t WHERE r <= 10;

Nondeterminism

-- ❌ Equal scores → order not guaranteed
RANK() OVER(ORDER BY score DESC)

-- ✅ Add a unique column
RANK() OVER(ORDER BY score DESC, id DESC)

When NOT to use RANK/DENSE_RANK

When unique numbering is needed — ROW_NUMBER (2x faster)
When ties don’t matter — simple LIMIT is enough
For hundreds of millions of rows where speed matters — ROW_NUMBER is faster

🔴 Senior Level

Execution physics and performance

Function costs:

ROW_NUMBER():   O(n) — simply increments a counter
RANK():         O(n log n) — comparison with previous row
                (Peer check: if values are equal — "peer" = "equal" —
                RANK assigns the same rank)
DENSE_RANK():   O(n log n) — peer check + tracking UNIQUE values

DENSE_RANK is more expensive than ROW_NUMBER:

Needs to compare the current value with the previous one
When the value changes → increment the rank
On 100M rows: ROW_NUMBER is 2x faster

Indexes and Sort elimination

-- Index on (partition_col, order_col DESC)
CREATE INDEX idx_players_score ON players(score DESC, id DESC);

-- Query without Sort
SELECT name, score,
       RANK() OVER(ORDER BY score DESC, id DESC) as rank
FROM players;

-- Plan: WindowAgg → Index Scan (NO Sort!)
-- → 10-50x faster on large tables

Top-N with RANK: fair sampling

Scenario: Contest with prize positions

-- ❌ LIMIT is unfair with equal scores
SELECT * FROM contest ORDER BY score DESC LIMIT 10;
-- If 10th and 11th scored the same → one doesn't get the prize

-- ✅ RANK guarantees fairness
SELECT * FROM (
    SELECT name, score,
           RANK() OVER(ORDER BY score DESC) as rank
    FROM contest
) t WHERE rank <= 10;
-- Returns EVERYONE with rank <= 10, even if there are 15

NTILE for segmentation

-- Dividing customers into 10 groups by spending (deciles)
SELECT
    customer_id,
    total_spent,
    NTILE(10) OVER(ORDER BY total_spent DESC) as decile
FROM customers;

-- decile = 1 → top 10% of customers
-- decile = 10 → bottom 10%

-- Analysis by quartiles
SELECT
    quartile,
    COUNT(*) as customers,
    AVG(total_spent) as avg_spent,
    MIN(total_spent) as min_spent,
    MAX(total_spent) as max_spent
FROM (
    SELECT customer_id, total_spent,
           NTILE(4) OVER(ORDER BY total_spent DESC) as quartile
    FROM customers
) t
GROUP BY quartile
ORDER BY quartile;

PERCENT_RANK for percentiles

-- Find employees in the top 5% by salary
SELECT * FROM (
    SELECT name, salary,
           PERCENT_RANK() OVER(ORDER BY salary DESC) as pct
    FROM employees
) t
WHERE pct <= 0.05;

Edge Cases

Empty result

-- If the table is empty → 0 rows
SELECT name, RANK() OVER(ORDER BY salary DESC) FROM empty_table;
-- Result: 0 rows

PARTITION BY with RANK

-- Rank within each group
SELECT department, name, salary,
       RANK() OVER(PARTITION BY department ORDER BY salary DESC) as dept_rank
FROM employees;

-- Each department has its own ranks 1, 2, 3...

RANK with multiple ORDER BY columns

-- Ranking by multiple columns
SELECT name, score, time,
       RANK() OVER(ORDER BY score DESC, time ASC) as rank
FROM competition;

-- First by score (DESC), then by time (ASC)

Performance Tuning

Memory for Sort:

-- RANK/DENSE_RANK require sorting
-- If Sort doesn't fit in work_mem → spill to disk
EXPLAIN (ANALYZE) SELECT ... RANK() OVER(ORDER BY ...);
-- "Disk: X kB" → increase work_mem

Parallelism:

-- WindowAgg does NOT parallelize (up to PG 14)
-- PG 14+: Parallel Gather before WindowAgg
-- WindowAgg itself — runs in a single process

-- For very large tables:
-- Partitioning + UNION ALL

Production Experience

Real scenario #1: Seller rating

E-commerce: monthly rating of 50,000 sellers
Query: 10 seconds (Sort on disk)
Solution:
- Index on (month, revenue DESC)
- work_mem = 256MB
Result: 500ms

Real scenario #2: NTILE for A/B test

Marketing: dividing users into 4 groups by activity
Query with NTILE(4): 2 seconds
Usage: targeted campaigns by quartile
Result: conversion +15% thanks to precise segmentation

Monitoring

-- 1. Check the execution plan
EXPLAIN (ANALYZE, BUFFERS)
SELECT ... RANK() OVER(...);
-- WindowAgg → Sort → optimize with an index
-- WindowAgg → Index Scan → great!

-- 2. Check for spill
EXPLAIN (ANALYZE)
SELECT ... RANK() OVER(ORDER BY ...);
-- "Disk: X kB" → spill! Increase work_mem

-- 3. Find queries with ranking
SELECT query, mean_exec_time
FROM pg_stat_statements
WHERE query LIKE '%RANK()%' OR query LIKE '%DENSE_RANK()%'
ORDER BY mean_exec_time DESC;

Best Practices

RANK() for fair Top-N (all tied values are included)
DENSE_RANK() for levels/grades without skips
NTILE(n) for segmentation into equal groups
PERCENT_RANK() for percentiles (top 5%)
NULLS LAST/FIRST — specify explicitly
Index on (PARTITION BY, ORDER BY) → no Sort needed
Determinism: add a unique column to ORDER BY
LIMIT vs RANK: RANK for fair sampling, LIMIT for fixed counts

Summary for Senior

RANK() = Olympic system (with skips)
DENSE_RANK() = grading system (no skips)
ROW_NUMBER() = unique numbering (arbitrary for ties)
NTILE(n) = dividing into n equal groups
PERCENT_RANK() = relative rank (0.0-1.0)
RANK > LIMIT for fair Top-N with equal values
Index on (PARTITION BY, ORDER BY) → WindowAgg without Sort
NULLS LAST/FIRST for correct ranking with NULLs

🎯 Interview Cheat Sheet

Must know:

RANK(): equal values = same rank, next rank with a skip (1, 1, 3)
DENSE_RANK(): equal values = same rank, no skips (1, 1, 2)
ROW_NUMBER(): always unique numbering (1, 2, 3) — arbitrary for ties
NTILE(n): divides into n equal groups (quartiles, deciles)
PERCENT_RANK(): relative rank 0.0-1.0, 0.0 = top
CUME_DIST(): cumulative distribution (what percentage is below)
RANK > LIMIT for “fair” Top-N (returns ALL with equal values)
NULLS LAST/FIRST: NULL defaults to FIRST in ASC, LAST in DESC
Index on (PARTITION BY, ORDER BY) → WindowAgg without Sort
RANK/DENSE_RANK more expensive than ROW_NUMBER (peer check → O(n log n))

Frequent follow-up questions:

“When is RANK better than LIMIT?” → With equal values (contests, ratings)
“Why is NULLS LAST needed?” → Without it, NULLs get rank = 1 in DESC
“Why is NTILE more useful than GROUP BY?” → Divides into equal groups for segmentation
“Why is RANK more expensive than ROW_NUMBER?” → Peer check + tracking unique values

Red flags (do NOT say):

❌ “RANK and ROW_NUMBER are the same” (RANK has skips, ROW_NUMBER is unique)
❌ “LIMIT for Top-N is always fair” (randomly cuts off ties)
❌ “NULLs don’t affect ranking” (NULL defaults to FIRST in DESC!)

Related topics:

[[What does ROW_NUMBER() do]] → differences from RANK
[[What are window functions]] → general overview
[[What does GROUP BY do]] → NTILE as an alternative to grouping