What is Repeatable Read?
At the Repeatable Read level, a transaction works with a "snapshot" of data taken at the moment of the first query.
🟢 Junior Level
Repeatable Read is an isolation level that guarantees: if a transaction has read data, all subsequent reads of the same rows will return the same result.
Key Properties
- Dirty Read: Prohibited
- Non-Repeatable Read: Prohibited — data will not change during the transaction
- Phantom Read: Depends on the DBMS (prohibited in MySQL, possible in the standard)
Simple Example
Transaction A: SELECT price FROM products WHERE id = 1; -- Returned 100
-- Another transaction changes the price and commits
Transaction A: SELECT price FROM products WHERE id = 1; -- Still 100!
At the Repeatable Read level, a transaction works with a “snapshot” of data taken at the moment of the first query.
Difference from Read Committed
- Read Committed: Each SELECT sees the latest committed data
- Repeatable Read: All SELECTs see data as it was at the start of the transaction
🟡 Middle Level
Implementation Mechanism (MVCC)
At the Repeatable Read level, the data snapshot is created once — at the moment of the first query in the transaction. All subsequent queries see the database in the state it was in at the time of the snapshot.
Differences Between MySQL and PostgreSQL
Although the name is the same, the implementation differs fundamentally:
MySQL (InnoDB)
- This is the default isolation level
- Prevents phantom reads through Next-Key Locking (Record Lock + Gap Lock):
Gap = the interval between two index entries. A Gap Lock blocks the insertion of new rows into this interval. For example, if the index has id=5 and id=10, a gap lock on (5,10) won’t allow inserting id=7.
- Not only rows are locked, but also the gaps between them
- Behaves almost like Serializable
PostgreSQL
- Does not use Gap Locks
- If a transaction tries to modify a row that another transaction committed after yours started — you get an error:
ERROR: could not serialize access due to concurrent update - The application must retry the entire transaction
Usage Scenarios
- Reports: Generating a report where data shouldn’t “drift” during the process
- Multi-step calculations: When intermediate results shouldn’t change
- Data migrations: Copying data in a consistent state
Don’t use RR for: high-concurrency writes (>100 writers on the same data), long transactions (>30 sec) — risk of bloat and serialization failures.
Overhead
- The DBMS stores old row versions longer (until the longest RR transaction finishes)
- Can lead to database bloat (Table Bloat)
- Increases load on the VACUUM process
Lost Update Problem
Repeatable Read in PostgreSQL protects against “lost updates.” If two threads simultaneously modify the same record, the second one will get a serialization error on commit.
🔴 Senior Level
MVCC Snapshot Lifecycle
PostgreSQL Snapshot Behavior
- Snapshot created at first read in transaction, not at BEGIN
- Snapshot includes:
xmin: oldest active transaction IDxmax: next transaction ID to be assignedxip_array: snapshot of all in-progress transactions
- All subsequent reads use this same snapshot
- Writes see their own changes immediately (own-XID visibility rule)
Tuple Visibility Under RR
// Simplified PostgreSQL visibility check
bool HeapTupleSatisfiesMVCC(HeapTuple tuple, Snapshot snapshot) {
if (tuple->xmin >= snapshot->xmax) return false;
if (tuple->xmax < snapshot->xmin) return false;
if (TransactionIdIsInProgress(tuple->xmin)) return false;
if (!TransactionIdDidCommit(tuple->xmin)) return false;
if (TransactionIdIsInProgress(tuple->xmax)) return true;
if (TransactionIdDidCommit(tuple->xmax)) return false;
return true;
}
MySQL Next-Key Locking — Internal Details
Next-Key Lock = Record Lock + Gap Lock
-- Index: id = 1, 5, 10, 15
SELECT * FROM t WHERE id = 10 FOR UPDATE;
This locks:
- The record with id=10 (record lock)
- The gap (5, 10) — prevents INSERT of id=6,7,8,9
- The gap (10, 15) — prevents INSERT of id=11,12,13,14
Why? Prevents phantom reads by blocking inserts that would match the query predicate.
Side effect: Higher deadlock probability, especially with range queries.
Serialization Failure Handling (PostgreSQL)
@Retryable(
value = {CannotSerializeTransactionException.class},
backoff = @Backoff(delay = 100, multiplier = 2, maxDelay = 1000),
maxAttempts = 3
)
@Transactional(isolation = Isolation.REPEATABLE_READ)
public void updateBalance(Long accountId, BigDecimal amount) {
Account account = repo.findById(accountId);
account.setBalance(account.getBalance().add(amount));
repo.save(account);
}
Lost Update Prevention Deep Dive
Scenario under RR in PostgreSQL:
T1: BEGIN; T2: BEGIN;
T1: SELECT balance FROM acc WHERE id=1; -- sees 100
T2: SELECT balance FROM acc WHERE id=1; -- sees 100
T1: UPDATE acc SET balance = 90 WHERE id=1;
T1: COMMIT;
T2: UPDATE acc SET balance = 80 WHERE id=1;
-- T2 BLOCKS, waiting for T1
-- T1 committed, T2 detects conflict
-- ERROR: could not serialize access
PostgreSQL detects that T2 read a row that T1 modified after T2’s snapshot. T2 must retry.
Performance Impact Analysis
| Metric | Read Committed | Repeatable Read |
|---|---|---|
| Snapshot creation | Per statement | Per transaction |
| Old version retention | Short-lived | Until longest tx ends |
| Bloat potential | Low | Medium-High |
| VACUUM pressure | Low | High |
| Conflict errors | None | Possible on writes |
| Deadlock risk | Low | Medium (MySQL) |
Production Recommendations
When to use Repeatable Read:
- Complex reports requiring self-consistency
- Multi-step business logic with invariants
- Batch processing where source data must not change
- When you need protection against lost updates without explicit locking
When to avoid:
- High-concurrency write-heavy workloads
- Long-running transactions (bloat risk)
- When retry logic is too complex for your use case
Tuning Tips:
- Keep RR transactions as short as possible
- Monitor
pg_stat_user_tables.n_dead_tupfor bloat - Set
statement_timeoutto prevent runaway transactions - Use
pg_stat_activityto detect long-running RR transactions
🎯 Interview Cheat Sheet
Must know:
- Repeatable Read guarantees: re-SELECT of the same rows returns the same result
- Prohibits dirty read and non-repeatable read; phantoms — depends on DBMS
- PostgreSQL: snapshot created on first query, one for the entire transaction
- MySQL: default level, prevents phantoms through Next-Key Locking (Gap Locks)
- PostgreSQL throws a serialization error on update conflicts — retry needed
- RR protects against lost updates in PostgreSQL — the second thread gets a serialization failure
Common follow-up questions:
- Why do MySQL and PG handle phantoms differently on RR? — MySQL: Gap Locks, PG: MVCC snapshot
- When to use RR? — Reports, multi-step calculations, data migrations
- What is a Gap Lock? — Lock on the interval between index entries, prevents inserts
- How does RR differ from Serializable? — RR doesn’t prevent write skew, Serializable does
Red flags (DO NOT say):
- “RR is the same in all DBMS” — MySQL and PG differ fundamentally
- “RR = Serializable” — RR doesn’t prevent all anomalies (write skew)
- “RR has no performance impact” — bloat from old row versions, VACUUM pressure
Related topics:
- [[2. What transaction isolation levels exist]]
- [[6. What is Serializable]]
- [[10. What is Lost Update]]
- [[12. What is the default isolation level in MySQL]]