What does ROW_NUMBER() do?
When ROW_NUMBER() is used with LIMIT:
π’ Junior Level
ROW_NUMBER() β a window function that assigns a unique sequential number to each row.
Simple analogy: Like page numbering in a book or a queue at a store β everyone gets their own unique number in order.
Example:
-- Number all orders by date
SELECT
id,
amount,
created_at,
ROW_NUMBER() OVER(ORDER BY created_at DESC) as row_num
FROM orders;
-- Result:
-- id | amount | created_at | row_num
-- 5 | 100 | 2024-04-01 | 1
-- 4 | 200 | 2024-03-15 | 2
-- 3 | 150 | 2024-03-01 | 3
Common uses:
- Row numbering in reports
- Pagination
- Finding the first/last record in a group
- Removing duplicates
π‘ Middle Level
Syntax
ROW_NUMBER() OVER(
PARTITION BY column1 -- Dividing into groups (optional)
ORDER BY column2 -- Sorting within the group (required!)
)
Common patterns
1. Top-N for each group:
-- Latest 3 orders for each user
SELECT * FROM (
SELECT
user_id,
amount,
created_at,
ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY created_at DESC) as rn
FROM orders
) sub
WHERE rn <= 3;
2. Deduplication:
-- Keep only the latest version of each record
WITH ranked AS (
SELECT id, email, created_at,
ROW_NUMBER() OVER(PARTITION BY email ORDER BY created_at DESC) as rn
FROM users
)
DELETE FROM users WHERE id IN (SELECT id FROM ranked WHERE rn > 1);
3. Pagination:
-- Page 3, 20 records per page
SELECT * FROM (
SELECT id, name,
ROW_NUMBER() OVER(ORDER BY id) as rn
FROM users
) sub
WHERE rn BETWEEN 41 AND 60;
Nondeterminism β an important trap!
-- β PROBLEM: if there are users with the same created_at
SELECT id, name, created_at,
ROW_NUMBER() OVER(ORDER BY created_at DESC) as rn
FROM users;
-- The order of rows with the same created_at is NOT guaranteed!
-- β
SOLUTION: add a unique column
SELECT id, name, created_at,
ROW_NUMBER() OVER(ORDER BY created_at DESC, id DESC) as rn
FROM users;
-- id guarantees a unique order
ROW_NUMBER vs COUNT(*) OVER()
-- ROW_NUMBER() β cheap (just increments a counter)
SELECT id, name, ROW_NUMBER() OVER() as rn FROM users LIMIT 10;
-- COUNT(*) OVER() β expensive (reads ALL rows in the partition)
SELECT id, name, COUNT(*) OVER() as total FROM users LIMIT 10;
-- β οΈ Despite LIMIT 10, the database will read ALL users!
Common mistakes
- Filtering in the same query
-- β ERROR: WHERE can't see rn SELECT id, ROW_NUMBER() OVER(ORDER BY id) as rn FROM users WHERE rn <= 10; -- Error: column "rn" does not exist -- β Subquery or CTE SELECT * FROM ( SELECT id, ROW_NUMBER() OVER(ORDER BY id) as rn FROM users ) sub WHERE rn <= 10; - Forgot ORDER BY
-- β Syntax error ROW_NUMBER() OVER(PARTITION BY user_id) -- β ORDER BY is required ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY created_at)
Comparison with alternatives
| Function | Uniqueness | Handling equal values |
|---|---|---|
| ROW_NUMBER() | β Always | Arbitrarily numbers them |
| RANK() | β No | Equal values = same rank, then skip |
| DENSE_RANK() | β No | Equal values = same rank, no skips |
When NOT to use ROW_NUMBER
- For aggregations (COUNT, SUM, AVG) β use window aggregate functions instead
- For ranking with ties β use RANK or DENSE_RANK instead
- For API pagination on large tables (> 1M rows) β use Keyset Pagination (
WHERE id > ?)
π΄ Senior Level
Physical optimization: eliminating Sort
Key insight: If the data is already sorted, PostgreSQL skips the Sort phase!
-- Index on (user_id, created_at DESC)
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at DESC);
-- Query:
SELECT user_id, amount, created_at,
ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY created_at DESC) as rn
FROM orders;
-- Plan: WindowAgg β Index Scan (NO Sort!)
-- WindowAgg β a PostgreSQL execution plan node that computes the window
-- function. When you see `WindowAgg β Index Scan` β data is read
-- already in the correct order and the Sort phase is skipped.
-- β Data is read from the index already in the correct order
-- β 10-50x faster on large tables!
Monitoring:
EXPLAIN SELECT ... ROW_NUMBER() OVER(...);
-- Look for: WindowAgg β Index Scan β great!
-- Look for: WindowAgg β Sort β can be optimized with an index
Bounded Heap Sort (Top-N optimization)
When ROW_NUMBER() is used with LIMIT:
SELECT * FROM (
SELECT id, amount,
ROW_NUMBER() OVER(ORDER BY amount DESC) as rn
FROM orders
) sub WHERE rn <= 10;
-- PostgreSQL applies Bounded Heap Sort:
-- β Does NOT sort the entire table
-- β Keeps only the top-10 elements in memory (as if you picked
-- top-10 candidates without sorting all 1000 resumes)
-- β Evicts extras as it processes
-- β Memory: O(k) instead of O(n), where k = LIMIT
Incremental Sort (PG 13+)
-- Index on user_id
CREATE INDEX idx_orders_user ON orders(user_id);
-- Query with PARTITION BY
SELECT user_id, created_at,
ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY created_at DESC) as rn
FROM orders;
-- PG 13+: Incremental Sort
-- Data is already grouped by user_id (from the index)
-- β Only sorts by created_at within each group
-- β Much cheaper than a full sort!
-- On PG 12 and below β full sort of all data, even if part is already sorted.
Keyset Pagination (better than ROW_NUMBER)
-- β ROW_NUMBER pagination (slow on large OFFSET)
SELECT * FROM (
SELECT id, name, ROW_NUMBER() OVER(ORDER BY id) as rn
FROM users
) sub WHERE rn BETWEEN 100001 AND 100020;
-- β
Keyset Pagination (fast, uses index)
SELECT id, name FROM users
WHERE id > 100000 -- Last id from the previous page
ORDER BY id
LIMIT 20;
-- β Index Scan, O(log n + k) instead of O(n)
-- ROW_NUMBER requires numbering ALL rows up to the desired OFFSET,
-- while Keyset Pagination jumps directly to the right spot via B-tree index.
-- For page 50,000: ROW_NUMBER processes 100,000 rows, Keyset β only 20.
When to use Keyset:
- Infinite scroll
- API pagination
- Large tables (> 1M rows)
When ROW_NUMBER is OK:
- Arbitrary pages (jump to page 57)
- Small tables (< 100,000 rows)
- Reports with fixed numbering
Determinism and stability
-- Always make ORDER BY deterministic!
-- Bad:
ROW_NUMBER() OVER(ORDER BY created_at)
-- Good (add PK):
ROW_NUMBER() OVER(ORDER BY created_at DESC, id DESC)
-- Why?
-- If 2 records have the same created_at
-- β The order between them is undefined
-- β Result may change from run to run
-- β Bugs in production!
Edge Cases
- PARTITION BY with a single element
-- If a group has 1 row β rn = 1 always SELECT user_id, ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY id) as rn FROM orders; -- For users with 1 order: rn = 1 -- For users with N orders: rn = 1, 2, ..., N - NULL in ORDER BY
-- NULL defaults to LAST in DESC SELECT id, priority, ROW_NUMBER() OVER(ORDER BY priority DESC) as rn FROM tasks; -- Explicit NULL management SELECT id, priority, ROW_NUMBER() OVER(ORDER BY priority DESC NULLS LAST) as rn FROM tasks; - ROW_NUMBER() without PARTITION BY
-- Numbering ALL rows SELECT id, name, ROW_NUMBER() OVER(ORDER BY id) as rn FROM users; -- rn = 1, 2, 3, ..., N
Performance Tuning
Memory for Sort:
-- If Sort doesn't fit in work_mem β spill to disk
EXPLAIN (ANALYZE) SELECT ... ROW_NUMBER() OVER(ORDER BY ...);
-- "external merge Disk: X kB" β increase work_mem
-- Temporarily increase for the query
SET work_mem = '256MB';
-- ... query ...
RESET work_mem;
Parallelism:
-- WindowAgg does NOT parallelize in PG < 14
-- PG 14+: may use Parallel Gather before WindowAgg
-- But WindowAgg itself runs in a single process
-- Workaround for large data:
-- Table partitioning + UNION ALL
Production Experience
Real scenario #1: Deduplicating 10M records
- CRM: duplicate contacts (same email)
- Query with ROW_NUMBER + DELETE: 3 minutes
- Solution:
- Index on
(email, created_at DESC) - Batch DELETE in chunks of 10,000 records
- Index on
- Result: 15 seconds
Real scenario #2: API Pagination
- E-commerce API: ROW_NUMBER pagination on 10M products
- Page 50,000: 8 seconds
- Solution: Keyset Pagination (WHERE id > ?)
- Result: 50ms (160x faster)
Monitoring
-- 1. Check the execution plan
EXPLAIN (ANALYZE, BUFFERS)
SELECT ... ROW_NUMBER() OVER(...);
-- WindowAgg β Sort β can be optimized with an index
-- WindowAgg β Index Scan β great!
-- 2. Check Sort memory
EXPLAIN (ANALYZE)
SELECT ... ROW_NUMBER() OVER(ORDER BY ...);
-- "Disk: X kB" β spill! Increase work_mem
-- 3. Find queries with ROW_NUMBER
SELECT query, mean_exec_time
FROM pg_stat_statements
WHERE query LIKE '%ROW_NUMBER()%'
ORDER BY mean_exec_time DESC;
Best Practices
- Always add PK to ORDER BY for determinism
- Index on
(PARTITION BY, ORDER BY)β no Sort needed - Keyset Pagination for large tables
- ROW_NUMBER only for reports/page-by-number pagination
- Bounded Heap Sort automatic with LIMIT
- Incremental Sort (PG 13+) with indexes
- Subquery/CTE for filtering by rn
Summary for Senior
- ROW_NUMBER() β unique numbering, always deterministic with PK
- Index on
(PARTITION BY, ORDER BY)β WindowAgg without Sort - Bounded Heap Sort with LIMIT β O(k) memory
- Keyset Pagination > ROW_NUMBER for API and infinite scroll
- Incremental Sort (PG 13+) saves time with partial sorting
- Determinism: always add PK to ORDER BY
- COUNT(*) OVER() is expensive β reads ALL partition rows, donβt use with LIMIT
- Filtering only in subquery/CTE
π― Interview Cheat Sheet
Must know:
- ROW_NUMBER() β unique numbering of every row (1, 2, 3, β¦)
- PARTITION BY β divides into groups, ORDER BY β sorting within (required!)
- Determinism: always add PK to ORDER BY (
ORDER BY created_at DESC, id DESC) - Filtering by rn β ONLY in subquery/CTE (window functions are computed AFTER WHERE)
- Bounded Heap Sort: with LIMIT β O(k) memory instead of O(n)
- Keyset Pagination (
WHERE id > last_id) > ROW_NUMBER for API/infinite scroll - Index on
(PARTITION BY, ORDER BY)β WindowAgg without Sort - Incremental Sort (PG 13+): data partially sorted β faster
- COUNT(*) OVER() is expensive β reads ALL partition rows, donβt use with LIMIT
- Deduplication: ROW_NUMBER + DELETE WHERE rn > 1
Frequent follow-up questions:
- βWhy is ROW_NUMBER pagination slow on large OFFSET?β β Sorts all rows up to OFFSET
- βWhat is Keyset Pagination?β β
WHERE id > last_id ORDER BY id LIMIT 20β Index Scan - βWhy is PK needed in ORDER BY?β β Without PK, order among equal values is undefined
- βWhen is COUNT(*) OVER() a problem?β β Despite LIMIT, it reads ALL rows
Red flags (do NOT say):
- β βROW_NUMBER can be filtered in WHEREβ (no, computed after WHERE)
- β βKeyset Pagination is the same as ROW_NUMBERβ (Keyset = Index Scan, ROW_NUMBER = Sort)
- β β
COUNT(*) OVER()OK with LIMITβ (reads ALL rows!)
Related topics:
- [[What does RANK() and DENSE_RANK() do]] β differences from ROW_NUMBER
- [[What are window functions]] β general overview
- [[What is explain plan]] β checking WindowAgg β Sort/Index Scan plan