Question 16 · Section 1

What does RANK() and DENSE_RANK() do?

4. PERCENT_RANK() for percentiles (top 5%) 5. NULLS LAST/FIRST — specify explicitly 6. Index on (PARTITION BY, ORDER BY) → no Sort needed 7. Determinism: add a unique column to...

Language versions: English Russian Ukrainian

🟢 Junior Level

RANK() and DENSE_RANK() — window functions that assign a rank (position) to rows based on their values, equal values receive the same rank.

Simple analogy: Sports competitions:

  • RANK() — Olympic system: two golds → next is bronze (skips a position)
  • DENSE_RANK() — grading system: two golds → next is silver (no skips)

Example:

-- Salaries: 100, 100, 80, 60

SELECT name, salary,
       RANK() OVER(ORDER BY salary DESC) as rank,
       DENSE_RANK() OVER(ORDER BY salary DESC) as dense_rank
FROM employees;

-- Result:
-- name   | salary | rank | dense_rank
-- Ivan   | 100    | 1    | 1        ← same salaries
-- Petr   | 100    | 1    | 1        ← same rank
-- Anna   | 80     | 3    | 2        ← RANK skips 2
-- Sergey | 60     | 4    | 3

When to use:

  • RANK() — when “distance” matters (how many people are ahead of you)
  • DENSE_RANK() — when “level” matters without skips
  • ROW_NUMBER() — when unique numbering is needed

🟡 Middle Level

Comparison of ranking functions

-- Values: 100, 100, 100, 80, 60

SELECT value,
       ROW_NUMBER() OVER(ORDER BY value DESC) as row_num,
       RANK() OVER(ORDER BY value DESC) as rank,
       DENSE_RANK() OVER(ORDER BY value DESC) as dense_rank
FROM data;

-- Result:
-- value | row_num | rank | dense_rank
-- 100   | 1       | 1    | 1
-- 100   | 2       | 1    | 1        ← all three 100s = rank 1
-- 100   | 3       | 1    | 1
-- 80    | 4       | 4    | 2        ← RANK: 4, DENSE: 2
-- 60    | 5       | 5    | 3
Function Equal values Skips Uniqueness
ROW_NUMBER() Different numbers No ✅ Always
RANK() Same rank ✅ Yes ❌ No
DENSE_RANK() Same rank ❌ No ❌ No

When to choose which

RANK() — when “honest positions” matter:

-- Top 10 players (returns everyone tied for 10th place)
SELECT * FROM (
    SELECT name, score, RANK() OVER(ORDER BY score DESC) as r
    FROM players
) t WHERE r <= 10;
-- If 3 players are tied for 10th → returns 12 rows (fair!)

DENSE_RANK() — when “levels” matter without skips:

-- Salary levels without skips
SELECT name, salary,
       DENSE_RANK() OVER(ORDER BY salary DESC) as salary_level
FROM employees;

-- Level 1, 2, 3... without skips
-- Convenient for reports: "How many salary levels exist?"
SELECT COUNT(DISTINCT salary_level) FROM (...);

Analytic ranking functions

PERCENT_RANK() — relative rank (0.0 to 1.0):

SELECT name, salary,
       ROUND(PERCENT_RANK() OVER(ORDER BY salary DESC)::numeric, 2) as pct
FROM employees;

-- pct = 0.00 → top (best)
-- pct = 1.00 → worst
-- pct = 0.50 → median
-- Formula: `(rank - 1) / (total_rows - 1)`
-- First: (1-1)/(N-1) = 0.00. Last: (N-1)/(N-1) = 1.00

NTILE(n) — dividing into groups:

-- Dividing into 4 quartiles (25% in each)
SELECT name, revenue,
       NTILE(4) OVER(ORDER BY revenue DESC) as quartile
FROM salespeople;

-- quartile = 1 → top 25%
-- quartile = 4 → bottom 25%

CUME_DIST() — cumulative distribution:

-- What percentage of employees earn less?
SELECT name, salary,
       ROUND(CUME_DIST() OVER(ORDER BY salary)::numeric, 2) as pct_below
FROM employees;

Handling NULL

-- NULLs are considered "the largest" by default
SELECT name, salary,
       RANK() OVER(ORDER BY salary DESC) as rank
FROM employees;

-- NULLs will get rank = 1!
-- Solution: NULLS LAST
SELECT name, salary,
       RANK() OVER(ORDER BY salary DESC NULLS LAST) as rank
FROM employees;

Common mistakes

  1. Forgot NULLS LAST
    -- ❌ NULLs will be first in DESC
    RANK() OVER(ORDER BY score DESC)
    
    -- ✅ NULLs at the end
    RANK() OVER(ORDER BY score DESC NULLS LAST)
    
  2. LIMIT instead of RANK for Top-N
    -- ⚠️ LIMIT may cut off tied values (safe when values are UNIQUE
    -- or ties don't matter, e.g. "show any 10 for a preview")
    SELECT * FROM players ORDER BY score DESC LIMIT 10;
    -- If 10th and 11th have the same score → one is randomly cut off
    
    -- ✅ RANK returns all tied values
    SELECT * FROM (
        SELECT *, RANK() OVER(ORDER BY score DESC) as r
        FROM players
    ) t WHERE r <= 10;
    
  3. Nondeterminism
    -- ❌ Equal scores → order not guaranteed
    RANK() OVER(ORDER BY score DESC)
    
    -- ✅ Add a unique column
    RANK() OVER(ORDER BY score DESC, id DESC)
    

When NOT to use RANK/DENSE_RANK

  1. When unique numbering is needed — ROW_NUMBER (2x faster)
  2. When ties don’t matter — simple LIMIT is enough
  3. For hundreds of millions of rows where speed matters — ROW_NUMBER is faster

🔴 Senior Level

Execution physics and performance

Function costs:

ROW_NUMBER():   O(n) — simply increments a counter
RANK():         O(n log n) — comparison with previous row
                (Peer check: if values are equal — "peer" = "equal" —
                RANK assigns the same rank)
DENSE_RANK():   O(n log n) — peer check + tracking UNIQUE values

DENSE_RANK is more expensive than ROW_NUMBER:

  • Needs to compare the current value with the previous one
  • When the value changes → increment the rank
  • On 100M rows: ROW_NUMBER is 2x faster

Indexes and Sort elimination

-- Index on (partition_col, order_col DESC)
CREATE INDEX idx_players_score ON players(score DESC, id DESC);

-- Query without Sort
SELECT name, score,
       RANK() OVER(ORDER BY score DESC, id DESC) as rank
FROM players;

-- Plan: WindowAgg → Index Scan (NO Sort!)
-- → 10-50x faster on large tables

Top-N with RANK: fair sampling

Scenario: Contest with prize positions

-- ❌ LIMIT is unfair with equal scores
SELECT * FROM contest ORDER BY score DESC LIMIT 10;
-- If 10th and 11th scored the same → one doesn't get the prize

-- ✅ RANK guarantees fairness
SELECT * FROM (
    SELECT name, score,
           RANK() OVER(ORDER BY score DESC) as rank
    FROM contest
) t WHERE rank <= 10;
-- Returns EVERYONE with rank <= 10, even if there are 15

NTILE for segmentation

-- Dividing customers into 10 groups by spending (deciles)
SELECT
    customer_id,
    total_spent,
    NTILE(10) OVER(ORDER BY total_spent DESC) as decile
FROM customers;

-- decile = 1 → top 10% of customers
-- decile = 10 → bottom 10%

-- Analysis by quartiles
SELECT
    quartile,
    COUNT(*) as customers,
    AVG(total_spent) as avg_spent,
    MIN(total_spent) as min_spent,
    MAX(total_spent) as max_spent
FROM (
    SELECT customer_id, total_spent,
           NTILE(4) OVER(ORDER BY total_spent DESC) as quartile
    FROM customers
) t
GROUP BY quartile
ORDER BY quartile;

PERCENT_RANK for percentiles

-- Find employees in the top 5% by salary
SELECT * FROM (
    SELECT name, salary,
           PERCENT_RANK() OVER(ORDER BY salary DESC) as pct
    FROM employees
) t
WHERE pct <= 0.05;

Edge Cases

  1. Empty result
    -- If the table is empty → 0 rows
    SELECT name, RANK() OVER(ORDER BY salary DESC) FROM empty_table;
    -- Result: 0 rows
    
  2. PARTITION BY with RANK
    -- Rank within each group
    SELECT department, name, salary,
           RANK() OVER(PARTITION BY department ORDER BY salary DESC) as dept_rank
    FROM employees;
    
    -- Each department has its own ranks 1, 2, 3...
    
  3. RANK with multiple ORDER BY columns
    -- Ranking by multiple columns
    SELECT name, score, time,
           RANK() OVER(ORDER BY score DESC, time ASC) as rank
    FROM competition;
    
    -- First by score (DESC), then by time (ASC)
    

Performance Tuning

Memory for Sort:

-- RANK/DENSE_RANK require sorting
-- If Sort doesn't fit in work_mem → spill to disk
EXPLAIN (ANALYZE) SELECT ... RANK() OVER(ORDER BY ...);
-- "Disk: X kB" → increase work_mem

Parallelism:

-- WindowAgg does NOT parallelize (up to PG 14)
-- PG 14+: Parallel Gather before WindowAgg
-- WindowAgg itself — runs in a single process

-- For very large tables:
-- Partitioning + UNION ALL

Production Experience

Real scenario #1: Seller rating

  • E-commerce: monthly rating of 50,000 sellers
  • Query: 10 seconds (Sort on disk)
  • Solution:
    • Index on (month, revenue DESC)
    • work_mem = 256MB
  • Result: 500ms

Real scenario #2: NTILE for A/B test

  • Marketing: dividing users into 4 groups by activity
  • Query with NTILE(4): 2 seconds
  • Usage: targeted campaigns by quartile
  • Result: conversion +15% thanks to precise segmentation

Monitoring

-- 1. Check the execution plan
EXPLAIN (ANALYZE, BUFFERS)
SELECT ... RANK() OVER(...);
-- WindowAgg → Sort → optimize with an index
-- WindowAgg → Index Scan → great!

-- 2. Check for spill
EXPLAIN (ANALYZE)
SELECT ... RANK() OVER(ORDER BY ...);
-- "Disk: X kB" → spill! Increase work_mem

-- 3. Find queries with ranking
SELECT query, mean_exec_time
FROM pg_stat_statements
WHERE query LIKE '%RANK()%' OR query LIKE '%DENSE_RANK()%'
ORDER BY mean_exec_time DESC;

Best Practices

  1. RANK() for fair Top-N (all tied values are included)
  2. DENSE_RANK() for levels/grades without skips
  3. NTILE(n) for segmentation into equal groups
  4. PERCENT_RANK() for percentiles (top 5%)
  5. NULLS LAST/FIRST — specify explicitly
  6. Index on (PARTITION BY, ORDER BY) → no Sort needed
  7. Determinism: add a unique column to ORDER BY
  8. LIMIT vs RANK: RANK for fair sampling, LIMIT for fixed counts

Summary for Senior

  • RANK() = Olympic system (with skips)
  • DENSE_RANK() = grading system (no skips)
  • ROW_NUMBER() = unique numbering (arbitrary for ties)
  • NTILE(n) = dividing into n equal groups
  • PERCENT_RANK() = relative rank (0.0-1.0)
  • RANK > LIMIT for fair Top-N with equal values
  • Index on (PARTITION BY, ORDER BY) → WindowAgg without Sort
  • NULLS LAST/FIRST for correct ranking with NULLs

🎯 Interview Cheat Sheet

Must know:

  • RANK(): equal values = same rank, next rank with a skip (1, 1, 3)
  • DENSE_RANK(): equal values = same rank, no skips (1, 1, 2)
  • ROW_NUMBER(): always unique numbering (1, 2, 3) — arbitrary for ties
  • NTILE(n): divides into n equal groups (quartiles, deciles)
  • PERCENT_RANK(): relative rank 0.0-1.0, 0.0 = top
  • CUME_DIST(): cumulative distribution (what percentage is below)
  • RANK > LIMIT for “fair” Top-N (returns ALL with equal values)
  • NULLS LAST/FIRST: NULL defaults to FIRST in ASC, LAST in DESC
  • Index on (PARTITION BY, ORDER BY) → WindowAgg without Sort
  • RANK/DENSE_RANK more expensive than ROW_NUMBER (peer check → O(n log n))

Frequent follow-up questions:

  • “When is RANK better than LIMIT?” → With equal values (contests, ratings)
  • “Why is NULLS LAST needed?” → Without it, NULLs get rank = 1 in DESC
  • “Why is NTILE more useful than GROUP BY?” → Divides into equal groups for segmentation
  • “Why is RANK more expensive than ROW_NUMBER?” → Peer check + tracking unique values

Red flags (do NOT say):

  • ❌ “RANK and ROW_NUMBER are the same” (RANK has skips, ROW_NUMBER is unique)
  • ❌ “LIMIT for Top-N is always fair” (randomly cuts off ties)
  • ❌ “NULLs don’t affect ranking” (NULL defaults to FIRST in DESC!)

Related topics:

  • [[What does ROW_NUMBER() do]] → differences from RANK
  • [[What are window functions]] → general overview
  • [[What does GROUP BY do]] → NTILE as an alternative to grouping