Question 15 Β· Section 1

What does ROW_NUMBER() do?

When ROW_NUMBER() is used with LIMIT:

Language versions: English Russian Ukrainian

🟒 Junior Level

ROW_NUMBER() β€” a window function that assigns a unique sequential number to each row.

Simple analogy: Like page numbering in a book or a queue at a store β€” everyone gets their own unique number in order.

Example:

-- Number all orders by date
SELECT
    id,
    amount,
    created_at,
    ROW_NUMBER() OVER(ORDER BY created_at DESC) as row_num
FROM orders;

-- Result:
-- id | amount | created_at | row_num
-- 5  | 100    | 2024-04-01 | 1
-- 4  | 200    | 2024-03-15 | 2
-- 3  | 150    | 2024-03-01 | 3

Common uses:

  • Row numbering in reports
  • Pagination
  • Finding the first/last record in a group
  • Removing duplicates

🟑 Middle Level

Syntax

ROW_NUMBER() OVER(
    PARTITION BY column1  -- Dividing into groups (optional)
    ORDER BY column2      -- Sorting within the group (required!)
)

Common patterns

1. Top-N for each group:

-- Latest 3 orders for each user
SELECT * FROM (
    SELECT
        user_id,
        amount,
        created_at,
        ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY created_at DESC) as rn
    FROM orders
) sub
WHERE rn <= 3;

2. Deduplication:

-- Keep only the latest version of each record
WITH ranked AS (
    SELECT id, email, created_at,
           ROW_NUMBER() OVER(PARTITION BY email ORDER BY created_at DESC) as rn
    FROM users
)
DELETE FROM users WHERE id IN (SELECT id FROM ranked WHERE rn > 1);

3. Pagination:

-- Page 3, 20 records per page
SELECT * FROM (
    SELECT id, name,
           ROW_NUMBER() OVER(ORDER BY id) as rn
    FROM users
) sub
WHERE rn BETWEEN 41 AND 60;

Nondeterminism β€” an important trap!

-- ❌ PROBLEM: if there are users with the same created_at
SELECT id, name, created_at,
       ROW_NUMBER() OVER(ORDER BY created_at DESC) as rn
FROM users;
-- The order of rows with the same created_at is NOT guaranteed!

-- βœ… SOLUTION: add a unique column
SELECT id, name, created_at,
       ROW_NUMBER() OVER(ORDER BY created_at DESC, id DESC) as rn
FROM users;
-- id guarantees a unique order

ROW_NUMBER vs COUNT(*) OVER()

-- ROW_NUMBER() β€” cheap (just increments a counter)
SELECT id, name, ROW_NUMBER() OVER() as rn FROM users LIMIT 10;

-- COUNT(*) OVER() β€” expensive (reads ALL rows in the partition)
SELECT id, name, COUNT(*) OVER() as total FROM users LIMIT 10;
-- ⚠️ Despite LIMIT 10, the database will read ALL users!

Common mistakes

  1. Filtering in the same query
    -- ❌ ERROR: WHERE can't see rn
    SELECT id, ROW_NUMBER() OVER(ORDER BY id) as rn
    FROM users
    WHERE rn <= 10;  -- Error: column "rn" does not exist
    
    -- βœ… Subquery or CTE
    SELECT * FROM (
        SELECT id, ROW_NUMBER() OVER(ORDER BY id) as rn FROM users
    ) sub WHERE rn <= 10;
    
  2. Forgot ORDER BY
    -- ❌ Syntax error
    ROW_NUMBER() OVER(PARTITION BY user_id)
    
    -- βœ… ORDER BY is required
    ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY created_at)
    

Comparison with alternatives

Function Uniqueness Handling equal values
ROW_NUMBER() βœ… Always Arbitrarily numbers them
RANK() ❌ No Equal values = same rank, then skip
DENSE_RANK() ❌ No Equal values = same rank, no skips

When NOT to use ROW_NUMBER

  1. For aggregations (COUNT, SUM, AVG) β€” use window aggregate functions instead
  2. For ranking with ties β€” use RANK or DENSE_RANK instead
  3. For API pagination on large tables (> 1M rows) β€” use Keyset Pagination (WHERE id > ?)

πŸ”΄ Senior Level

Physical optimization: eliminating Sort

Key insight: If the data is already sorted, PostgreSQL skips the Sort phase!

-- Index on (user_id, created_at DESC)
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at DESC);

-- Query:
SELECT user_id, amount, created_at,
       ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY created_at DESC) as rn
FROM orders;

-- Plan: WindowAgg β†’ Index Scan (NO Sort!)
-- WindowAgg β€” a PostgreSQL execution plan node that computes the window
-- function. When you see `WindowAgg β†’ Index Scan` β€” data is read
-- already in the correct order and the Sort phase is skipped.
-- β†’ Data is read from the index already in the correct order
-- β†’ 10-50x faster on large tables!

Monitoring:

EXPLAIN SELECT ... ROW_NUMBER() OVER(...);
-- Look for: WindowAgg β†’ Index Scan β†’ great!
-- Look for: WindowAgg β†’ Sort β†’ can be optimized with an index

Bounded Heap Sort (Top-N optimization)

When ROW_NUMBER() is used with LIMIT:

SELECT * FROM (
    SELECT id, amount,
           ROW_NUMBER() OVER(ORDER BY amount DESC) as rn
    FROM orders
) sub WHERE rn <= 10;

-- PostgreSQL applies Bounded Heap Sort:
-- β†’ Does NOT sort the entire table
-- β†’ Keeps only the top-10 elements in memory (as if you picked
--   top-10 candidates without sorting all 1000 resumes)
-- β†’ Evicts extras as it processes
-- β†’ Memory: O(k) instead of O(n), where k = LIMIT

Incremental Sort (PG 13+)

-- Index on user_id
CREATE INDEX idx_orders_user ON orders(user_id);

-- Query with PARTITION BY
SELECT user_id, created_at,
       ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY created_at DESC) as rn
FROM orders;

-- PG 13+: Incremental Sort
-- Data is already grouped by user_id (from the index)
-- β†’ Only sorts by created_at within each group
-- β†’ Much cheaper than a full sort!
-- On PG 12 and below β€” full sort of all data, even if part is already sorted.

Keyset Pagination (better than ROW_NUMBER)

-- ❌ ROW_NUMBER pagination (slow on large OFFSET)
SELECT * FROM (
    SELECT id, name, ROW_NUMBER() OVER(ORDER BY id) as rn
    FROM users
) sub WHERE rn BETWEEN 100001 AND 100020;

-- βœ… Keyset Pagination (fast, uses index)
SELECT id, name FROM users
WHERE id > 100000  -- Last id from the previous page
ORDER BY id
LIMIT 20;
-- β†’ Index Scan, O(log n + k) instead of O(n)
-- ROW_NUMBER requires numbering ALL rows up to the desired OFFSET,
-- while Keyset Pagination jumps directly to the right spot via B-tree index.
-- For page 50,000: ROW_NUMBER processes 100,000 rows, Keyset β€” only 20.

When to use Keyset:

  • Infinite scroll
  • API pagination
  • Large tables (> 1M rows)

When ROW_NUMBER is OK:

  • Arbitrary pages (jump to page 57)
  • Small tables (< 100,000 rows)
  • Reports with fixed numbering

Determinism and stability

-- Always make ORDER BY deterministic!
-- Bad:
ROW_NUMBER() OVER(ORDER BY created_at)

-- Good (add PK):
ROW_NUMBER() OVER(ORDER BY created_at DESC, id DESC)

-- Why?
-- If 2 records have the same created_at
-- β†’ The order between them is undefined
-- β†’ Result may change from run to run
-- β†’ Bugs in production!

Edge Cases

  1. PARTITION BY with a single element
    -- If a group has 1 row β†’ rn = 1 always
    SELECT user_id, ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY id) as rn
    FROM orders;
    -- For users with 1 order: rn = 1
    -- For users with N orders: rn = 1, 2, ..., N
    
  2. NULL in ORDER BY
    -- NULL defaults to LAST in DESC
    SELECT id, priority,
           ROW_NUMBER() OVER(ORDER BY priority DESC) as rn
    FROM tasks;
    
    -- Explicit NULL management
    SELECT id, priority,
           ROW_NUMBER() OVER(ORDER BY priority DESC NULLS LAST) as rn
    FROM tasks;
    
  3. ROW_NUMBER() without PARTITION BY
    -- Numbering ALL rows
    SELECT id, name, ROW_NUMBER() OVER(ORDER BY id) as rn
    FROM users;
    -- rn = 1, 2, 3, ..., N
    

Performance Tuning

Memory for Sort:

-- If Sort doesn't fit in work_mem β†’ spill to disk
EXPLAIN (ANALYZE) SELECT ... ROW_NUMBER() OVER(ORDER BY ...);
-- "external merge Disk: X kB" β†’ increase work_mem

-- Temporarily increase for the query
SET work_mem = '256MB';
-- ... query ...
RESET work_mem;

Parallelism:

-- WindowAgg does NOT parallelize in PG < 14
-- PG 14+: may use Parallel Gather before WindowAgg
-- But WindowAgg itself runs in a single process

-- Workaround for large data:
-- Table partitioning + UNION ALL

Production Experience

Real scenario #1: Deduplicating 10M records

  • CRM: duplicate contacts (same email)
  • Query with ROW_NUMBER + DELETE: 3 minutes
  • Solution:
    • Index on (email, created_at DESC)
    • Batch DELETE in chunks of 10,000 records
  • Result: 15 seconds

Real scenario #2: API Pagination

  • E-commerce API: ROW_NUMBER pagination on 10M products
  • Page 50,000: 8 seconds
  • Solution: Keyset Pagination (WHERE id > ?)
  • Result: 50ms (160x faster)

Monitoring

-- 1. Check the execution plan
EXPLAIN (ANALYZE, BUFFERS)
SELECT ... ROW_NUMBER() OVER(...);
-- WindowAgg β†’ Sort β†’ can be optimized with an index
-- WindowAgg β†’ Index Scan β†’ great!

-- 2. Check Sort memory
EXPLAIN (ANALYZE)
SELECT ... ROW_NUMBER() OVER(ORDER BY ...);
-- "Disk: X kB" β†’ spill! Increase work_mem

-- 3. Find queries with ROW_NUMBER
SELECT query, mean_exec_time
FROM pg_stat_statements
WHERE query LIKE '%ROW_NUMBER()%'
ORDER BY mean_exec_time DESC;

Best Practices

  1. Always add PK to ORDER BY for determinism
  2. Index on (PARTITION BY, ORDER BY) β†’ no Sort needed
  3. Keyset Pagination for large tables
  4. ROW_NUMBER only for reports/page-by-number pagination
  5. Bounded Heap Sort automatic with LIMIT
  6. Incremental Sort (PG 13+) with indexes
  7. Subquery/CTE for filtering by rn

Summary for Senior

  • ROW_NUMBER() β€” unique numbering, always deterministic with PK
  • Index on (PARTITION BY, ORDER BY) β†’ WindowAgg without Sort
  • Bounded Heap Sort with LIMIT β†’ O(k) memory
  • Keyset Pagination > ROW_NUMBER for API and infinite scroll
  • Incremental Sort (PG 13+) saves time with partial sorting
  • Determinism: always add PK to ORDER BY
  • COUNT(*) OVER() is expensive β€” reads ALL partition rows, don’t use with LIMIT
  • Filtering only in subquery/CTE

🎯 Interview Cheat Sheet

Must know:

  • ROW_NUMBER() β€” unique numbering of every row (1, 2, 3, …)
  • PARTITION BY β€” divides into groups, ORDER BY β€” sorting within (required!)
  • Determinism: always add PK to ORDER BY (ORDER BY created_at DESC, id DESC)
  • Filtering by rn β†’ ONLY in subquery/CTE (window functions are computed AFTER WHERE)
  • Bounded Heap Sort: with LIMIT β€” O(k) memory instead of O(n)
  • Keyset Pagination (WHERE id > last_id) > ROW_NUMBER for API/infinite scroll
  • Index on (PARTITION BY, ORDER BY) β†’ WindowAgg without Sort
  • Incremental Sort (PG 13+): data partially sorted β†’ faster
  • COUNT(*) OVER() is expensive β€” reads ALL partition rows, don’t use with LIMIT
  • Deduplication: ROW_NUMBER + DELETE WHERE rn > 1

Frequent follow-up questions:

  • β€œWhy is ROW_NUMBER pagination slow on large OFFSET?” β†’ Sorts all rows up to OFFSET
  • β€œWhat is Keyset Pagination?” β†’ WHERE id > last_id ORDER BY id LIMIT 20 β†’ Index Scan
  • β€œWhy is PK needed in ORDER BY?” β†’ Without PK, order among equal values is undefined
  • β€œWhen is COUNT(*) OVER() a problem?” β†’ Despite LIMIT, it reads ALL rows

Red flags (do NOT say):

  • ❌ β€œROW_NUMBER can be filtered in WHERE” (no, computed after WHERE)
  • ❌ β€œKeyset Pagination is the same as ROW_NUMBER” (Keyset = Index Scan, ROW_NUMBER = Sort)
  • ❌ β€œCOUNT(*) OVER() OK with LIMIT” (reads ALL rows!)

Related topics:

  • [[What does RANK() and DENSE_RANK() do]] β†’ differences from ROW_NUMBER
  • [[What are window functions]] β†’ general overview
  • [[What is explain plan]] β†’ checking WindowAgg β†’ Sort/Index Scan plan