What is Dirty Read?

🟢 Junior Level

Dirty Read is an anomaly where a transaction reads data modified by another transaction that has not yet completed (no COMMIT has been issued).

Simple Example

Transaction A: UPDATE accounts SET balance = 500 WHERE id = 1;
Transaction B: SELECT balance FROM accounts WHERE id = 1; -- Sees 500
Transaction A: ROLLBACK; -- Undid the changes, balance is back to 100

Transaction B read the value 500, which no longer exists. This is “dirty” data.

Why This Is Bad

You make decisions based on data that doesn’t actually exist in the database.

At Which Level It Occurs

Only at the Read Uncommitted level. All other levels prohibit dirty reads.

Where It Occurs

PostgreSQL and Oracle don’t have this level — they automatically protect against dirty reads.

🟡 Middle Level

Detailed Scenario

Time	Transaction A	Transaction B (Read Uncommitted)
T1	`BEGIN;`	`BEGIN;`
T2	`UPDATE accounts SET balance = 500 WHERE id = 1;`
T3		`SELECT balance FROM accounts WHERE id = 1;` → 500
T4	`ROLLBACK;`
T5		Works with balance 500, actually 100

Consequences for the System

Finance: Credit decisions based on an unconfirmed balance
Warehouses: Reserving non-existent goods
Invariants: Integrity violation — we see an intermediate invalid state

How DBMS Protect Against Dirty Read

Terms:

S-lock (Shared) — read lock: multiple transactions can read simultaneously, but nobody can write.

X-lock (Exclusive) — write lock: only one transaction writes, others wait.

MVCC — storing multiple row versions, each tied to the authoring transaction’s ID.

Shared Locks (S-locks)

A reading transaction waits for the exclusive lock (X-lock) on the record to be released. Reading is blocked until writing completes.

MVCC (Multi-Version Concurrency Control)

Reading happens from the last committed row version. Uncommitted changes create a temporary version that others don’t see.

Where It Is Allowed

The only level — Read Uncommitted.

Important: PostgreSQL doesn’t have this level (automatically elevated to Read Committed). Oracle doesn’t either.

Dirty Read vs Other Anomalies

Dirty Read: you see a VALUE that doesn’t yet exist in the database (the authoring transaction did a ROLLBACK). Non-Repeatable Read: you see a REAL committed value, but it changed between two of your SELECTs.

Dirty Read: Reading uncommitted data
Non-Repeatable Read: Reading committed data that changed between two SELECTs
Phantom Read: New rows appearing between two SELECTs

🔴 Senior Level

MVCC Protection Mechanism — Deep Dive

PostgreSQL Implementation

In PostgreSQL, every heap tuple contains:

t_xmin: transaction ID that inserted this row version
t_xmax: transaction ID that deleted/updated this row version

Visibility check for dirty read prevention:

bool TupleIsVisibleToSnapshot(HeapTuple tuple, Snapshot snap) {
    // If inserting transaction is still in progress → NOT visible
    if (TransactionIdIsInProgress(tuple->t_xmin))
        return false;

    // If inserting transaction didn't commit → NOT visible
    if (!TransactionIdDidCommit(tuple->t_xmin))
        return false;

    // If deleting transaction committed → NOT visible
    if (TransactionIdIsValid(tuple->t_xmax) &&
        TransactionIdDidCommit(tuple->t_xmax))
        return false;

    return true;
}

This check is free — it’s part of normal MVCC visibility, no extra overhead to prevent dirty reads.

Multi-Version Storage

Uncommitted changes create new tuple versions in-place
These versions have t_xmin = current (in-progress) transaction ID
Other transactions’ snapshots exclude in-progress transaction IDs
Result: dirty reads impossible by design, zero extra cost

Why Dirty Reads Are Particularly Dangerous

1. Silent Data Corruption

Unlike crashes or explicit errors, dirty reads produce wrong results without any error. The application continues normally with corrupted data.

2. Cascading Effects

Dirty Read → Wrong Decision → Wrong DB Write → More Dirty Data

The corruption propagates through the system, making debugging extremely difficult.

3. Non-Deterministic Behavior

Dirty reads are timing-dependent. They may occur only under specific load patterns, making them nearly impossible to reproduce in testing.

SQL Server NOLOCK Deep Dive

SELECT * FROM accounts WITH (NOLOCK);

What NOLOCK actually does:

No Shared (S) locks acquired
No blocking on Exclusive (X) locks
Can read uncommitted rows from in-flight transactions
Can read rows twice (if page splits occur during scan)
Can skip rows (if allocation order scan and pages move)

In practice, NOLOCK can return:

Rows that were later rolled back
Duplicate rows
Missing rows
Allocation-order inconsistencies

Historical Context

Early database systems (pre-MVCC) used lock-based concurrency:

Readers acquired S-locks, writers acquired X-locks
S-lock conflicts with X-lock → readers blocked
To avoid blocking, some systems offered Read Uncommitted
This was a performance hack, not a feature

Modern MVCC systems (PostgreSQL, Oracle) made this trade-off obsolete:

Readers never block writers
Writers never block readers
Dirty read prevention comes for free

When People Think They Need Dirty Reads

Scenario: “I need fast approximate counts, I don’t care about exactness”

Better alternatives:

SELECT reltuples FROM pg_class WHERE relname = 'table' — PostgreSQL estimate
Materialized views refreshed periodically
Count from read replicas with acceptable lag
External analytics systems (ClickHouse, Druid)

Production Reality

The vast majority of production systems should not use Read Uncommitted. The only exceptions are approximate dashboards and internal monitoring. The “performance benefit” is negligible compared to:

Risk of corrupted business decisions
Debug complexity
Data integrity violations
Modern alternatives (read replicas, caching)

If you’re on PostgreSQL or Oracle, you can’t even use it — the databases prevent it architecturally.

🎯 Interview Cheat Sheet

Must know:

Dirty Read — reading uncommitted data from another transaction that may ROLLBACK
Only possible at Read Uncommitted level; all other levels prohibit it
PostgreSQL and Oracle architecturally prevent dirty reads via MVCC
SQL Server uses WITH (NOLOCK) for dirty reads, but it’s risky (duplicates, skips)
Dirty read is especially dangerous: the application works with data that “never existed”
MVCC prevents dirty reads for free — readers always see committed versions only

Common follow-up questions:

Why does MVCC prevent dirty reads without overhead? — Uncommitted changes create tuple versions with in-progress transaction IDs, invisible to other snapshots
How does dirty read differ from non-repeatable read? — Dirty read = uncommitted data, non-repeatable = committed but changed
What is EvalPlanQual? — PostgreSQL mechanism for re-evaluating WHERE on concurrent UPDATE

Red flags (DO NOT say):

“NOLOCK is a normal production approach” — can read duplicates and skipped rows
“Dirty read = better performance” — no difference in MVCC systems
“This is a rare problem” — cascading corruption can be catastrophic

Related topics:

[[3. What is Read Uncommitted]]
[[8. What is Non-Repeatable Read]]
[[9. What is Phantom Read]]
[[2. What transaction isolation levels exist]]