What is Repeatable Read?

🟢 Junior Level

Repeatable Read is an isolation level that guarantees: if a transaction has read data, all subsequent reads of the same rows will return the same result.

Key Properties

Dirty Read: Prohibited
Non-Repeatable Read: Prohibited — data will not change during the transaction
Phantom Read: Depends on the DBMS (prohibited in MySQL, possible in the standard)

Simple Example

Transaction A: SELECT price FROM products WHERE id = 1; -- Returned 100
-- Another transaction changes the price and commits
Transaction A: SELECT price FROM products WHERE id = 1; -- Still 100!

At the Repeatable Read level, a transaction works with a “snapshot” of data taken at the moment of the first query.

Difference from Read Committed

Read Committed: Each SELECT sees the latest committed data
Repeatable Read: All SELECTs see data as it was at the start of the transaction

🟡 Middle Level

Implementation Mechanism (MVCC)

At the Repeatable Read level, the data snapshot is created once — at the moment of the first query in the transaction. All subsequent queries see the database in the state it was in at the time of the snapshot.

Differences Between MySQL and PostgreSQL

Although the name is the same, the implementation differs fundamentally:

MySQL (InnoDB)

This is the default isolation level
Prevents phantom reads through Next-Key Locking (Record Lock + Gap Lock):

Gap = the interval between two index entries. A Gap Lock blocks the insertion of new rows into this interval. For example, if the index has id=5 and id=10, a gap lock on (5,10) won’t allow inserting id=7.

Not only rows are locked, but also the gaps between them
Behaves almost like Serializable

PostgreSQL

Does not use Gap Locks
If a transaction tries to modify a row that another transaction committed after yours started — you get an error: ERROR: could not serialize access due to concurrent update
The application must retry the entire transaction

Usage Scenarios

Reports: Generating a report where data shouldn’t “drift” during the process
Multi-step calculations: When intermediate results shouldn’t change
Data migrations: Copying data in a consistent state

Don’t use RR for: high-concurrency writes (>100 writers on the same data), long transactions (>30 sec) — risk of bloat and serialization failures.

Overhead

The DBMS stores old row versions longer (until the longest RR transaction finishes)
Can lead to database bloat (Table Bloat)
Increases load on the VACUUM process

Lost Update Problem

Repeatable Read in PostgreSQL protects against “lost updates.” If two threads simultaneously modify the same record, the second one will get a serialization error on commit.

🔴 Senior Level

MVCC Snapshot Lifecycle

PostgreSQL Snapshot Behavior

Snapshot created at first read in transaction, not at BEGIN
Snapshot includes:
- xmin: oldest active transaction ID
- xmax: next transaction ID to be assigned
- xip_array: snapshot of all in-progress transactions
All subsequent reads use this same snapshot
Writes see their own changes immediately (own-XID visibility rule)

Tuple Visibility Under RR

// Simplified PostgreSQL visibility check
bool HeapTupleSatisfiesMVCC(HeapTuple tuple, Snapshot snapshot) {
    if (tuple->xmin >= snapshot->xmax) return false;
    if (tuple->xmax < snapshot->xmin) return false;
    if (TransactionIdIsInProgress(tuple->xmin)) return false;
    if (!TransactionIdDidCommit(tuple->xmin)) return false;
    if (TransactionIdIsInProgress(tuple->xmax)) return true;
    if (TransactionIdDidCommit(tuple->xmax)) return false;
    return true;
}

MySQL Next-Key Locking — Internal Details

Next-Key Lock = Record Lock + Gap Lock

-- Index: id = 1, 5, 10, 15
SELECT * FROM t WHERE id = 10 FOR UPDATE;

This locks:

The record with id=10 (record lock)
The gap (5, 10) — prevents INSERT of id=6,7,8,9
The gap (10, 15) — prevents INSERT of id=11,12,13,14

Why? Prevents phantom reads by blocking inserts that would match the query predicate.

Side effect: Higher deadlock probability, especially with range queries.

Serialization Failure Handling (PostgreSQL)

@Retryable(
    value = {CannotSerializeTransactionException.class},
    backoff = @Backoff(delay = 100, multiplier = 2, maxDelay = 1000),
    maxAttempts = 3
)
@Transactional(isolation = Isolation.REPEATABLE_READ)
public void updateBalance(Long accountId, BigDecimal amount) {
    Account account = repo.findById(accountId);
    account.setBalance(account.getBalance().add(amount));
    repo.save(account);
}

Lost Update Prevention Deep Dive

Scenario under RR in PostgreSQL:

T1: BEGIN;                              T2: BEGIN;
T1: SELECT balance FROM acc WHERE id=1; -- sees 100
                                        T2: SELECT balance FROM acc WHERE id=1; -- sees 100
T1: UPDATE acc SET balance = 90 WHERE id=1;
T1: COMMIT;
                                        T2: UPDATE acc SET balance = 80 WHERE id=1;
                                        -- T2 BLOCKS, waiting for T1
                                        -- T1 committed, T2 detects conflict
                                        -- ERROR: could not serialize access

PostgreSQL detects that T2 read a row that T1 modified after T2’s snapshot. T2 must retry.

Performance Impact Analysis

Metric	Read Committed	Repeatable Read
Snapshot creation	Per statement	Per transaction
Old version retention	Short-lived	Until longest tx ends
Bloat potential	Low	Medium-High
VACUUM pressure	Low	High
Conflict errors	None	Possible on writes
Deadlock risk	Low	Medium (MySQL)

Production Recommendations

When to use Repeatable Read:

Complex reports requiring self-consistency
Multi-step business logic with invariants
Batch processing where source data must not change
When you need protection against lost updates without explicit locking

When to avoid:

High-concurrency write-heavy workloads
Long-running transactions (bloat risk)
When retry logic is too complex for your use case

Tuning Tips:

Keep RR transactions as short as possible
Monitor pg_stat_user_tables.n_dead_tup for bloat
Set statement_timeout to prevent runaway transactions
Use pg_stat_activity to detect long-running RR transactions

🎯 Interview Cheat Sheet

Must know:

Repeatable Read guarantees: re-SELECT of the same rows returns the same result
Prohibits dirty read and non-repeatable read; phantoms — depends on DBMS
PostgreSQL: snapshot created on first query, one for the entire transaction
MySQL: default level, prevents phantoms through Next-Key Locking (Gap Locks)
PostgreSQL throws a serialization error on update conflicts — retry needed
RR protects against lost updates in PostgreSQL — the second thread gets a serialization failure

Common follow-up questions:

Why do MySQL and PG handle phantoms differently on RR? — MySQL: Gap Locks, PG: MVCC snapshot
When to use RR? — Reports, multi-step calculations, data migrations
What is a Gap Lock? — Lock on the interval between index entries, prevents inserts
How does RR differ from Serializable? — RR doesn’t prevent write skew, Serializable does

Red flags (DO NOT say):

“RR is the same in all DBMS” — MySQL and PG differ fundamentally
“RR = Serializable” — RR doesn’t prevent all anomalies (write skew)
“RR has no performance impact” — bloat from old row versions, VACUUM pressure

Related topics:

[[2. What transaction isolation levels exist]]
[[6. What is Serializable]]
[[10. What is Lost Update]]
[[12. What is the default isolation level in MySQL]]