What is Dirty Read?
Transaction B read the value 500, which no longer exists. This is "dirty" data.
🟢 Junior Level
Dirty Read is an anomaly where a transaction reads data modified by another transaction that has not yet completed (no COMMIT has been issued).
Simple Example
Transaction A: UPDATE accounts SET balance = 500 WHERE id = 1;
Transaction B: SELECT balance FROM accounts WHERE id = 1; -- Sees 500
Transaction A: ROLLBACK; -- Undid the changes, balance is back to 100
Transaction B read the value 500, which no longer exists. This is “dirty” data.
Why This Is Bad
You make decisions based on data that doesn’t actually exist in the database.
At Which Level It Occurs
Only at the Read Uncommitted level. All other levels prohibit dirty reads.
Where It Occurs
PostgreSQL and Oracle don’t have this level — they automatically protect against dirty reads.
🟡 Middle Level
Detailed Scenario
| Time | Transaction A | Transaction B (Read Uncommitted) |
|---|---|---|
| T1 | BEGIN; |
BEGIN; |
| T2 | UPDATE accounts SET balance = 500 WHERE id = 1; |
|
| T3 | SELECT balance FROM accounts WHERE id = 1; → 500 |
|
| T4 | ROLLBACK; |
|
| T5 | Works with balance 500, actually 100 |
Consequences for the System
- Finance: Credit decisions based on an unconfirmed balance
- Warehouses: Reserving non-existent goods
- Invariants: Integrity violation — we see an intermediate invalid state
How DBMS Protect Against Dirty Read
Terms:
- S-lock (Shared) — read lock: multiple transactions can read simultaneously, but nobody can write.
- X-lock (Exclusive) — write lock: only one transaction writes, others wait.
- MVCC — storing multiple row versions, each tied to the authoring transaction’s ID.
Shared Locks (S-locks)
A reading transaction waits for the exclusive lock (X-lock) on the record to be released. Reading is blocked until writing completes.
MVCC (Multi-Version Concurrency Control)
Reading happens from the last committed row version. Uncommitted changes create a temporary version that others don’t see.
Where It Is Allowed
The only level — Read Uncommitted.
Important: PostgreSQL doesn’t have this level (automatically elevated to Read Committed). Oracle doesn’t either.
Dirty Read vs Other Anomalies
Dirty Read: you see a VALUE that doesn’t yet exist in the database (the authoring transaction did a ROLLBACK). Non-Repeatable Read: you see a REAL committed value, but it changed between two of your SELECTs.
- Dirty Read: Reading uncommitted data
- Non-Repeatable Read: Reading committed data that changed between two SELECTs
- Phantom Read: New rows appearing between two SELECTs
🔴 Senior Level
MVCC Protection Mechanism — Deep Dive
PostgreSQL Implementation
In PostgreSQL, every heap tuple contains:
t_xmin: transaction ID that inserted this row versiont_xmax: transaction ID that deleted/updated this row version
Visibility check for dirty read prevention:
bool TupleIsVisibleToSnapshot(HeapTuple tuple, Snapshot snap) {
// If inserting transaction is still in progress → NOT visible
if (TransactionIdIsInProgress(tuple->t_xmin))
return false;
// If inserting transaction didn't commit → NOT visible
if (!TransactionIdDidCommit(tuple->t_xmin))
return false;
// If deleting transaction committed → NOT visible
if (TransactionIdIsValid(tuple->t_xmax) &&
TransactionIdDidCommit(tuple->t_xmax))
return false;
return true;
}
This check is free — it’s part of normal MVCC visibility, no extra overhead to prevent dirty reads.
Multi-Version Storage
- Uncommitted changes create new tuple versions in-place
- These versions have
t_xmin= current (in-progress) transaction ID - Other transactions’ snapshots exclude in-progress transaction IDs
- Result: dirty reads impossible by design, zero extra cost
Why Dirty Reads Are Particularly Dangerous
1. Silent Data Corruption
Unlike crashes or explicit errors, dirty reads produce wrong results without any error. The application continues normally with corrupted data.
2. Cascading Effects
Dirty Read → Wrong Decision → Wrong DB Write → More Dirty Data
The corruption propagates through the system, making debugging extremely difficult.
3. Non-Deterministic Behavior
Dirty reads are timing-dependent. They may occur only under specific load patterns, making them nearly impossible to reproduce in testing.
SQL Server NOLOCK Deep Dive
SELECT * FROM accounts WITH (NOLOCK);
What NOLOCK actually does:
- No Shared (S) locks acquired
- No blocking on Exclusive (X) locks
- Can read uncommitted rows from in-flight transactions
- Can read rows twice (if page splits occur during scan)
- Can skip rows (if allocation order scan and pages move)
In practice, NOLOCK can return:
- Rows that were later rolled back
- Duplicate rows
- Missing rows
- Allocation-order inconsistencies
Historical Context
Early database systems (pre-MVCC) used lock-based concurrency:
- Readers acquired S-locks, writers acquired X-locks
- S-lock conflicts with X-lock → readers blocked
- To avoid blocking, some systems offered Read Uncommitted
- This was a performance hack, not a feature
Modern MVCC systems (PostgreSQL, Oracle) made this trade-off obsolete:
- Readers never block writers
- Writers never block readers
- Dirty read prevention comes for free
When People Think They Need Dirty Reads
Scenario: “I need fast approximate counts, I don’t care about exactness”
Better alternatives:
SELECT reltuples FROM pg_class WHERE relname = 'table'— PostgreSQL estimate- Materialized views refreshed periodically
- Count from read replicas with acceptable lag
- External analytics systems (ClickHouse, Druid)
Production Reality
The vast majority of production systems should not use Read Uncommitted. The only exceptions are approximate dashboards and internal monitoring. The “performance benefit” is negligible compared to:
- Risk of corrupted business decisions
- Debug complexity
- Data integrity violations
- Modern alternatives (read replicas, caching)
If you’re on PostgreSQL or Oracle, you can’t even use it — the databases prevent it architecturally.
🎯 Interview Cheat Sheet
Must know:
- Dirty Read — reading uncommitted data from another transaction that may ROLLBACK
- Only possible at Read Uncommitted level; all other levels prohibit it
- PostgreSQL and Oracle architecturally prevent dirty reads via MVCC
- SQL Server uses WITH (NOLOCK) for dirty reads, but it’s risky (duplicates, skips)
- Dirty read is especially dangerous: the application works with data that “never existed”
- MVCC prevents dirty reads for free — readers always see committed versions only
Common follow-up questions:
- Why does MVCC prevent dirty reads without overhead? — Uncommitted changes create tuple versions with in-progress transaction IDs, invisible to other snapshots
- How does dirty read differ from non-repeatable read? — Dirty read = uncommitted data, non-repeatable = committed but changed
- What is EvalPlanQual? — PostgreSQL mechanism for re-evaluating WHERE on concurrent UPDATE
Red flags (DO NOT say):
- “NOLOCK is a normal production approach” — can read duplicates and skipped rows
- “Dirty read = better performance” — no difference in MVCC systems
- “This is a rare problem” — cascading corruption can be catastrophic
Related topics:
- [[3. What is Read Uncommitted]]
- [[8. What is Non-Repeatable Read]]
- [[9. What is Phantom Read]]
- [[2. What transaction isolation levels exist]]