Question 9 · Section 11

What is Phantom Read?

Phantom read occurs because most DBMS at levels below Serializable read a data snapshot, but don't lock key ranges.

Language versions: English Russian Ukrainian

🟢 Junior Level

Phantom Read is a transaction anomaly where the same data selection within one transaction returns a different number of rows because another transaction inserted or deleted matching records.

Simple analogy: You count people in a room (got 10). While you turn away, someone enters. You count again — now 12. These “new” people are phantoms.

SQL example:

-- Transaction 1
BEGIN;
SELECT COUNT(*) FROM orders WHERE status = 'pending';  -- returns 5

-- While Transaction 1 is open, Transaction 2 does:
-- INSERT INTO orders (status) VALUES ('pending');
-- COMMIT;

-- Transaction 1 repeats the same query:
SELECT COUNT(*) FROM orders WHERE status = 'pending';  -- returns 6 — a phantom!
COMMIT;

Key difference from non-repeatable read:

  • Non-repeatable read — the value of an existing row changed (UPDATE/DELETE).
  • Phantom read — a completely new row appeared (INSERT).

Which isolation levels protect: | Level | Phantoms Possible? | | —————- | —————— | | Read Uncommitted | Yes | | Read Committed | Yes | | Repeatable Read | Depends on DBMS | | Serializable | No |

When it matters: Totals calculation, report generation, uniqueness check before insert.


🟡 Middle Level

How It Works Internally

Phantom read occurs because most DBMS at levels below Serializable read a data snapshot, but don’t lock key ranges.

When you execute SELECT WHERE status = 'pending', the database returns rows existing at the time of the snapshot. But if another transaction inserts a new row with status = 'pending', your repeated query will see it as a “phantom.”

Real Scenarios

Scenario 1: Report Generation

@Transactional
public Report generateDailyReport(LocalDate date) {
    int orderCount = orderRepo.countByDate(date);       // 100 orders
    double totalAmount = orderRepo.sumByDate(date);     // Calculating for 100 orders

    // If a new order for the same date is inserted at this moment,
    // orderCount and totalAmount become inconsistent
    return new Report(orderCount, totalAmount);
}

Scenario 2: Uniqueness Check Before Insert

@Transactional
public void registerUser(String email) {
    long count = userRepo.countByEmail(email);  // = 0
    // Another transaction inserts a user with this email and commits
    userRepo.save(new User(email));              // Duplicate! Or unique violation if there's an index
}

Common Mistakes

Mistake Consequence Solution
Repeated SELECT in a long transaction for data validation Getting phantom rows, incorrect business logic Shorten transaction or use Serializable
COUNT() + subsequent processing of results Missing or extra records Use FOR UPDATE or Serializable
Assuming Repeatable Read protects from phantoms in PostgreSQL Expectation not met — in PG RR protects from phantoms, in MySQL it doesn’t Know your DBMS specifics

Comparison: How Different DBMS Handle Phantoms on Repeatable Read

DBMS Phantoms on RR? Mechanism
PostgreSQL No (protects via MVCC) Snapshot from first read, doesn’t see new rows
MySQL/InnoDB No (protects via Next-Key Locks) Gap Locks block ranges
Oracle No (MVCC) Similar to PostgreSQL
SQL Server Yes (doesn’t protect) On RR sees new rows on repeated query

When NOT to Worry About Phantoms

  • Read-only reports with acceptable margin of error
  • Caching — if data can be slightly stale
  • Statistics — approximate counts are acceptable

🔴 Senior Level

Internal Implementation: MVCC and Predicate Locks

Terms:

  • Predicate lock — a lock not on a specific row, but on the query CONDITION (WHERE status=’pending’).
  • RW-dependency — a situation where T1 reads data that T2 later writes.
  • Coarsening — merging many fine-grained locks into one coarse lock to save memory.

MVCC Approach (PostgreSQL)

In PostgreSQL, phantom read at the Repeatable Read level is prevented by the MVCC architecture:

Each row contains system fields:
  - xmin: transaction ID that inserted the row
  - xmax: transaction ID that deleted the row (0 = alive)

On SELECT:
  1. A snapshot is taken with the current xmin
  2. Only rows where xmin < snapshot.xmin are visible
  3. A new row from another transaction has xmin > snapshot.xmin → INVISIBLE

This means: PostgreSQL on RR effectively protects from phantoms, although the ANSI standard doesn’t require it. This is a side effect of the MVCC design.

Predicate Locks (Serializable in PostgreSQL)

At the Serializable level, PostgreSQL uses Serializable Snapshot Isolation (SSI):

On executing: SELECT * FROM orders WHERE status = 'pending'

1. A predicate lock is registered on the index range (or full table scan)
2. If another transaction does INSERT INTO orders (status='pending')
   - An RW-dependency is detected: T1 read the predicate, T2 wrote to it
3. On commit, the dependency graph is checked
4. If a dangerous cycle is found → one transaction gets:
   ERROR: could not serialize access due to read/write dependencies
   SQLSTATE: 40001

Next-Key Locking (MySQL/InnoDB)

MySQL on Repeatable Read uses a different approach:

-- Query: SELECT * FROM orders WHERE status = 'pending' FOR UPDATE;

InnoDB locks:
1. Record locks: the rows matching the condition
2. Gap locks: the intervals BETWEEN found rows
3. Next-key locks: combination of 1 + 2

A Gap Lock on range (10, 20) won't allow INSERT INTO orders (id=15, ...)

Problem: Gap Locks can cause phantom-like deadlocks:

T1: SELECT * FROM t WHERE id BETWEEN 10 AND 20 FOR UPDATE;  -- lock gap (10,20)
T2: SELECT * FROM t WHERE id BETWEEN 15 AND 25 FOR UPDATE;  -- lock gap (15,25) — conflict!
-- T1 and T2 wait for each other → DEADLOCK

Architectural Trade-offs

Approach A: Accept phantom reads (Read Committed)

  • ✅ Pros: Maximum throughput, no range locks
  • ❌ Cons: Application must be resilient to inconsistent repeated reads
  • Suitable for: high-throughput CRUD systems where repeated read accuracy isn’t critical

Approach B: Prevent phantom reads (Repeatable Read / Serializable)

  • ✅ Pros: Strict consistency, deterministic behavior
  • ❌ Cons: 2-10x throughput reduction, serialization failure risk (5-30% abort rate)
  • Suitable for: financial systems, booking, inventory

Approach C: Application-level prevention (Optimistic Locking)

  • ✅ Pros: No DB locks, scales horizontally
  • ❌ Cons: Implementation complexity, needs retries, eventual consistency
  • Suitable for: microservices, CQRS, event sourcing

Edge Cases and Corner Cases

1. Phantoms with UPDATE:

-- Transaction 1:
BEGIN;
UPDATE orders SET status = 'processed' WHERE status = 'pending';
-- Affected 5 rows

-- Transaction 2 (between two SELECTs in T1):
INSERT INTO orders (status) VALUES ('pending');
COMMIT;

-- Transaction 1:
SELECT COUNT(*) FROM orders WHERE status = 'pending';
-- Returns 1 (the just-inserted row, if level = Read Committed)

2. Phantoms with Temporary Tables: Temporary tables in PostgreSQL are NOT tracked by SSI. Phantom read is possible even on Serializable if data goes through temp tables.

3. Phantoms with sequences: nextval('seq') is not rolled back on transaction abort. This creates “gaps” in numbering that may be mistakenly taken as phantoms during auditing.

4. Phantom DELETE:

// Transaction 1 read 100 records
List<Order> orders = orderRepo.findAllPending();

// Transaction 2 deleted 50 of them and committed

// Transaction 1 tries to update all 100
// 50 records are gone — UPDATE affects 0 rows
// Silent data loss

5. Index-only scan phantom: If a query uses a covering index, it may see a different number of rows than a query with full table scan, due to MVCC snapshot visibility specifics.

Performance Implications

Protection Method Latency Impact Throughput Impact Memory
Serializable (SSI) +50-200ms 2-10x lower Predicate locks: ~5-20MB shared memory
SELECT … FOR UPDATE +5-50ms 1.5-3x lower Row locks: ~200 bytes/lock
Gap Locks (MySQL) +10-100ms 2-5x lower Gap locks: ~100 bytes/lock
Optimistic Locking +0ms (happy path), +50ms (retry) Depends on contention @Version column: 4 bytes/row
Application locking (Redis) +5-15ms 1.2-2x lower Redis key: ~100 bytes

Numbers depend on PG version, hardware, and load pattern (PostgreSQL 15, 8 cores, NVMe):

  • Read Committed: ~50,000 TPS. Use as order of magnitude, not absolute values.
  • Repeatable Read: ~48,000 TPS (almost no overhead for read-only)
  • Serializable: ~8,000 TPS with moderate write contention (84% reduction due to aborts)

Memory Implications

  • Predicate locks (SSI): Stored in shared memory. Under high load, coarsening occurs (merging) up to relation lock level. Size: ~128 bytes per predicate lock.
  • Gap locks (MySQL): Stored in InnoDB lock table. With many gap locks, innodb_table_locks exhaustion may occur.
  • MVCC snapshots: Each active transaction holds a snapshot. With long-running transactions, old row versions (dead tuples) cannot be deleted → table bloat.

Concurrency Aspects

Serialization failure pattern:

T1: reads range [A, B]
T2: reads range [B, C]
T3: inserts into [A, B] and commits
T4: inserts into [B, C] and commits

T1 commit → checks: T3 wrote into T1's range → RW-dependency
T2 commit → checks: T4 wrote into T2's range → RW-dependency

If a cycle forms → one transaction aborts

Real Production Scenario

Situation: Payment processing system at a fintech company (2024).

Problem: At Read Committed level during daily report generation, phantoms occurred. The report showed 10,000 transactions totaling $5M, but detailed analysis revealed that 50 transactions ($250K) were inserted AFTER the initial COUNT but got included in the final total.

What happened:

// BatchJob, runs daily
@Transactional
public DailyReport generateReport() {
    int count = txRepo.countByDate(today);         // snapshot 1
    BigDecimal sum = txRepo.sumByDate(today);       // snapshot 2

    // At this moment the payment gateway confirmed 50 transactions
    // and they were inserted into the DB (Read Committed)

    List<Transaction> txs = txRepo.findAllByDate(today); // snapshot 3 — new rows visible!
    // count = 10,000, but txs.size() = 10,050
    // sum calculated from 10,050 transactions, but count = 10,000
}

Error: three different snapshots in one transaction. Snapshot 1 (count=10000), Snapshot 3 (txs.size=10050). Sum was calculated from 10050 records, but count=10000. On Repeatable Read, all three queries would use the same snapshot.

Solution: Switching to Repeatable Read (PostgreSQL) solved the problem without code changes, since MVCC snapshot guarantees consistency of all SELECTs within one transaction.

Impact:

  • Before: 0.5% of reports had discrepancies
  • After: 0% discrepancies
  • Performance: no change (read-only workload on RR is free in PG)

Monitoring and Diagnostics

PostgreSQL — tracking serialization failures:

-- Check conflicts
SELECT datname, conflicts, deadlocks
FROM pg_stat_database
WHERE datname = 'your_db';

-- Active predicate locks
SELECT * FROM pg_locks WHERE mode LIKE '%Serializable%';

MySQL — tracking gap lock conflicts:

-- Deadlock log
SHOW ENGINE INNODB STATUS\G

-- Current locks
SELECT * FROM information_schema.innodb_locks;

Application-level (Micrometer):

// Track phantom-related retries
Counter.builder("transaction.phantom.retries")
    .tag("operation", "report_generation")
    .register(meterRegistry);

Best Practices for Highload

  1. Avoid Serializable for read-heavy workloads — use Repeatable Read (PostgreSQL) or Read Committed with application-level consistency checks.
  2. Shorten transactions — the shorter the transaction, the smaller the window for phantoms.
  3. Use FOR UPDATE selectively — only for critical queries, not for entire reports.
  4. Materialize the snapshot — for large reports, create a temporary table with snapshot data and work with it.
  5. Event Sourcing for audit — instead of repeated SELECTs, use an event stream where order is guaranteed.
  6. Data versioning — add a version column for detecting phantom-related changes.

🎯 Interview Cheat Sheet

Must know:

  • Phantom Read — repeated SELECT returns a different set of rows (INSERT/DELETE from another transaction)
  • Difference from Non-Repeatable Read: Phantom = new rows, NRR = changed existing rows
  • On Repeatable Read: PostgreSQL prevents phantoms via MVCC, MySQL via Gap Locks
  • On Serializable: fully prevented via SSI (PostgreSQL) or 2PL
  • Predicate locks in SSI lock not rows but the query CONDITION (WHERE status=’pending’)
  • Real case: report showed 10,000 transactions, but detailed analysis found 10,050 — phantoms

Common follow-up questions:

  • Why does PG prevent phantoms on RR although the ANSI standard doesn’t require it? — Side effect of MVCC: snapshot doesn’t see new rows
  • What is Next-Key Locking in MySQL? — Record Lock + Gap Lock, blocks both rows and gaps
  • When are phantoms not dangerous? — Read-only reports with acceptable margin of error, caching, statistics
  • How to monitor phantoms in production? — Track serialization failures, monitor long-running transactions

Red flags (DO NOT say):

  • “Repeatable Read prevents phantoms in all DBMS” — SQL Server allows phantoms on RR
  • “Phantom Read = DBMS bug” — this is expected behavior at levels below Serializable
  • “FOR UPDATE solves all phantom problems” — FOR UPDATE locks specific rows, not ranges

Related topics:

  • [[5. What is Repeatable Read]]
  • [[6. What is Serializable]]
  • [[8. What is Non-Repeatable Read]]
  • [[10. What is Lost Update]]