Question 11 · Section 11

What Is the Default Isolation Level in PostgreSQL

The default isolation level in PostgreSQL is Read Committed.

Language versions: English Russian Ukrainian

🟢 Junior Level

The default isolation level in PostgreSQL is Read Committed.

What this means in simple terms: Each SELECT command within a transaction sees only data that has already been saved (committed) by other transactions. Important: each SELECT sees its own snapshot. Therefore, two SELECTs in the same transaction may return different data. But if you run SELECT twice within one transaction, the second query may see new data that appeared after the first query.

Simple analogy: You’re reading a news feed. Each time you refresh the page (new SELECT), you see fresh news. But you don’t see drafts that haven’t been published yet (uncommitted data).

SQL example:

-- Initial state: account.balance = 1000

-- Transaction 1
BEGIN;
SELECT balance FROM accounts WHERE id = 1;  -- sees 1000
-- (waits...)

-- Transaction 2 (runs in parallel)
BEGIN;
UPDATE accounts SET balance = 1500 WHERE id = 1;
COMMIT;  -- data saved

-- Transaction 1 continues:
SELECT balance FROM accounts WHERE id = 1;  -- sees 1500! (new data)
COMMIT;

What is protected and what isn’t on Read Committed:

Anomaly Possible? Description
Dirty Read No Doesn’t see uncommitted data
Non-repeatable Read Yes Repeated SELECT may return different values
Phantom Read Yes Repeated SELECT may return different rows

How to change the isolation level:

-- For the current session
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;

-- In Spring
@Transactional(isolation = Isolation.REPEATABLE_READ)

🟡 Middle Level

How It Works Internally

PostgreSQL implements Read Committed through MVCC (Multi-Version Concurrency Control):

Terms:

  • xmin/xmax — system fields in each PG row, storing the authoring transaction ID.
  • Dead tuples — old row versions no longer needed by any active transaction.
  • TOAST — PG mechanism for storing large values (TEXT, JSONB) separately from the main row.
  • Autovacuum — PG background process that removes dead tuples.
Each row in PostgreSQL contains two system fields:
  - xmin: ID of the transaction that inserted this row
  - xmax: ID of the transaction that deleted this row (0 = row is alive)

On SELECT:
  1. PostgreSQL takes the current snapshot
  2. Checks: xmin < current_transaction_id and the row is "visible"
  3. Each SELECT command gets a NEW snapshot

Key point: Unlike some other DBMS, PostgreSQL at the Read Committed level creates a new snapshot for each command, not for the entire transaction. Therefore, a repeated SELECT may see data committed by other transactions after the first SELECT.

Practical Application

When Read Committed Is Sufficient:

  1. CRUD applications — most web applications work correctly on Read Committed
  2. Reports without strict consistency requirements — small margin of error is acceptable
  3. High-load systems — minimal overhead, maximum throughput

When Read Committed Is NOT Sufficient:

  1. Financial calculations — consistent data is needed throughout the entire transaction
  2. Data migrations — a stable snapshot is required
  3. Complex business invariants — checks depending on multiple tables

Real Scenarios

Scenario 1: Repeated Read Problem on Read Committed

@Transactional  // Read Committed by default
public void transfer(Long fromId, Long toId, BigDecimal amount) {
    // Snapshot 1
    Account from = accountRepo.findById(fromId);
    Account to = accountRepo.findById(toId);

    // Here another transaction may change the 'from' balance

    // Snapshot 2 — may see different data!
    BigDecimal fromBalance = from.getBalance();
    BigDecimal toBalance = to.getBalance();

    // If data changed, the transfer may be incorrect
    from.setBalance(fromBalance.subtract(amount));
    to.setBalance(toBalance.add(amount));
}

Scenario 2: Consistent Report on Repeatable Read

@Transactional(isolation = Isolation.REPEATABLE_READ)
public Report generateReport(LocalDate date) {
    // All SELECTs see the same snapshot
    int count = txRepo.countByDate(date);
    BigDecimal sum = txRepo.sumByDate(date);
    // count and sum are guaranteed to be consistent
    return new Report(count, sum);
}

Common Mistakes

Mistake Consequence Solution
Assuming @Transactional gives a consistent snapshot Repeated SELECT sees different data Use Repeatable Read for consistent reports
Long transaction on Read Committed Larger window for conflicts Shorten the transaction
Read Committed for financial operations Lost Update, incorrect calculations Atomic SQL (SET balance = balance - ?) or Repeatable Read + FOR UPDATE
Ignoring default_transaction_isolation Different sessions may have different levels Set the level explicitly in code or configuration

Comparison of Isolation Levels in PostgreSQL

Level Dirty Read Non-repeatable Read Phantom Read Throughput (relative)
Read Uncommitted Possible Possible Possible 1.0x
Read Committed (default) No Yes Yes 1.0x
Repeatable Read No No No (in PG) 0.95x
Serializable No No No 0.3-0.5x

Numbers are relative for read-heavy workload (80% SELECT). For write-heavy (80% UPDATE) the difference will be larger: RC=1.0x, RR=0.8x, Serializable=0.2x.

Important: In PostgreSQL, Repeatable Read also protects against phantom reads (thanks to MVCC), which goes beyond the ANSI SQL standard.

When NOT to Change the Default

  • Standard CRUD operations — Read Committed is a good default choice for standard CRUD operations.
  • Microservices with eventual consistency — strict isolation is excessive
  • High-load read-heavy systems — Serializability overhead is not justified

🔴 Senior Level

Internal Implementation: MVCC and Snapshot Management

Transaction ID and Visibility Rules

// Simplified visibility logic in PostgreSQL (src/backend/access/common/heaptuple.c)

bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot) {
    TransactionId xmin = htup->t_data->t_xmin;  // inserting transaction
    TransactionId xmax = htup->t_data->t_xmax;  // deleting transaction

    // Read Committed rules:
    // 1. If xmin hasn't committed yet → row is invisible
    // 2. If xmax committed → row is invisible (deleted)
    // 3. snapshot is created anew for EACH query

    // Repeatable Read rules:
    // snapshot is created ONCE at the first query in the transaction
    // and reused for all subsequent queries
}

Transaction ID wraparound:

  • PostgreSQL uses 32-bit transaction IDs (~4 billion)
  • After overflow, wraparound occurs
  • Autovacuum must “freeze” old rows (xmax = FrozenTransactionId)
  • If autovacuum can’t keep up → database shutdown (transaction ID wraparound failure)

Snapshot Creation Overhead

Read Committed:
  - New snapshot for each query
  - Snapshot = array of in-progress transaction IDs
  - Size: O(active_transactions)
  - Allocation: ~100-500 bytes per snapshot

Repeatable Read / Serializable:
  - Snapshot created once at the first query
  - Preserved until the end of the transaction
  - Additionally: SSI serializable transaction tracking (~1KB overhead)

MVCC and Dead Tuples

Each UPDATE or DELETE doesn’t physically remove the row, but marks it as dead:

UPDATE accounts SET balance = 1500 WHERE id = 1;

Old row version:
  xmin = 100, xmax = 200 (new transaction), balance = 1000  → DEAD TUPLE

New row version:
  xmin = 200, xmax = 0, balance = 1500  → LIVE TUPLE

Overhead:

  • Dead tuples take up disk space (table bloat: 2-10x growth)
  • Indexes also accumulate dead entries
  • AutoVacuum must periodically clean up (I/O overhead: 10-30% disk throughput)

Architectural Trade-offs

Approach A: Read Committed (default)

  • ✅ Pros: Minimal overhead, simple semantics, no serialization failures, maximum throughput
  • ❌ Cons: Non-repeatable reads, phantom reads, can’t rely on consistency within long transactions
  • Suitable for: 90% of web applications, microservices, CRUD

Approach B: Repeatable Read

  • ✅ Pros: Consistent snapshot, phantom protection (in PG), predictable behavior
  • ❌ Cons: Snapshot-too-old errors on long-running transactions, more dead tuples
  • Suitable for: reports, migrations, batch processing

Approach C: Serializable (SSI)

  • ✅ Pros: Full ACID, guarantee of correctness for any concurrent transactions
  • ❌ Cons: Serialization failures (10-30% abort rate), predicate lock overhead (5-20MB shared memory), 2-10x lower throughput
  • Suitable for: financial systems where correctness is more important than performance

Edge Cases and Corner Cases

1. Snapshot Too Old (Repeatable Read / Serializable):

T1: BEGIN ISOLATION LEVEL REPEATABLE READ;
T1: SELECT * FROM accounts;  -- creates snapshot

// ... 10 minutes pass ...
// AutoVacuum removes dead tuples, including versions visible only to T1's snapshot

T1: SELECT * FROM accounts;
-- ERROR: could not serialize access due to concurrent update
-- or: ERROR: snapshot too old

Solution: Increase vacuum_freeze_min_age or shorten transactions.

2. Read Committed and UPDATE Contention:

-- T1 and T2 simultaneously:
T1: UPDATE accounts SET balance = balance - 100 WHERE id = 1;
T2: UPDATE accounts SET balance = balance - 100 WHERE id = 1;

-- PostgreSQL: T2 waits for T1, then T2 restarts UPDATE with a new snapshot
-- T2 sees T1's result and subtracts another 100
-- Result: balance -= 200 (correct!)

-- BUT if the logic is at the application level (read-modify-write):
-- T1: SELECT balance → 1000
-- T2: SELECT balance → 1000
-- T1: UPDATE balance = 900
-- T2: UPDATE balance = 900  ← Lost Update!

3. DDL Inside Transaction on Read Committed:

BEGIN;
ALTER TABLE accounts ADD COLUMN new_col TEXT;  -- DDL
SELECT * FROM accounts;  -- new snapshot sees the new column
-- Other sessions don't see new_col yet until COMMIT
-- But there may be blocking!

4. Read Committed and Foreign Keys:

-- T1: INSERT INTO orders (user_id) VALUES (999);
-- T2: DELETE FROM users WHERE id = 999;

-- FK check in T1 verifies existence of user_id = 999
-- On Read Committed, FK check creates a separate snapshot
-- If T2 already committed the DELETE → FK violation!

5. Read Committed and Cursor:

BEGIN;
DECLARE cur CURSOR FOR SELECT * FROM accounts;
FETCH 10 FROM cur;  -- snapshot 1

-- Another transaction modifies data

FETCH 10 FROM cur;  -- snapshot 2, may see different data!
-- Cursor does NOT guarantee consistency on Read Committed

Performance Implications

Level Latency (p50) Latency (p99) Throughput Memory Overhead
Read Committed 1-3ms 5-15ms 50,000+ TPS Snapshot: ~200 bytes/query
Repeatable Read 1-3ms 5-20ms 48,000 TPS Snapshot: ~200 bytes/tx + hold dead tuples
Serializable 5-20ms 50-200ms 8,000-15,000 TPS Predicate locks: 5-20MB shared memory

Specific numbers (PostgreSQL 15, 8 cores, NVMe, pgbench):

Numbers are for a specific setup. On your hardware/version they may differ by an order of magnitude.

  • Read Committed: ~55,000 SELECT-only TPS
  • Repeatable Read: ~52,000 SELECT-only TPS (5% overhead from snapshot retention)
  • Serializable: ~12,000 TPS with moderate contention (78% reduction)

Write workload (50% reads, 50% writes):

  • Read Committed: ~25,000 TPS
  • Repeatable Read: ~24,000 TPS
  • Serializable: ~6,000 TPS (serialization failures + retries)

Memory Implications

  • Snapshot arrays: ~200 bytes per snapshot. At 1000 queries/sec = 200KB/s allocation.
  • Dead tuple retention (RR/SRL): With long-running transactions, dead tuples aren’t vacuumed. A 1GB table may grow to 5-10GB.
  • Predicate locks (Serializable): Shared memory pool (~64MB default). On overflow → lock coarsening → more serialization failures.
  • TOAST bloat: Large fields (TEXT, JSONB) are stored in TOAST tables. Dead TOAST tuples also accumulate.

Concurrency Aspects

Serialization failure patterns (SSI):

T1: reads range A (predicate lock on A)
T2: writes to range A
T1: writes to range B
T2: reads range B (predicate lock on B)

Dependency graph:
  T1 → (read A) → T2 → (write A) = RW-dependency
  T2 → (read B) → T1 → (write B) = RW-dependency

Cycle: T1 → T2 → T1 → serialization failure

Abort rate under contention:

  • 10 concurrent writers: ~5% serialization failures
  • 50 concurrent writers: ~15-25% serialization failures
  • 100+ concurrent writers: ~30-50% (unsuitable for production)

Real Production Scenario

Situation: Payment system (2024), PostgreSQL 14, processing 5000 transactions/sec.

Problem: Daily balance reconciliation showed discrepancies of 0.01-0.05%. With a daily turnover of $10M, this meant $1,000-5,000 “lost” dollars.

Root cause: Batch job on Read Committed:

@Transactional  // Read Committed
public BigDecimal calculateDailyTotal(LocalDate date) {
    // Snapshot 1: 10,000 transactions
    List<Transaction> txs = txRepo.findByDate(date);

    // While calculating, the payment gateway confirms new transactions
    // Snapshot 2: new committed rows

    BigDecimal total = txs.stream()
        .map(Transaction::getAmount)
        .reduce(BigDecimal.ZERO, BigDecimal::add);

    // Another query within the same transaction:
    int count = txRepo.countByDate(date);  // Snapshot 3 — sees more rows!

    // total calculated from 10,000, count = 10,050
    // Discrepancy in the report
}

Solution: Switch to Repeatable Read:

@Transactional(isolation = Isolation.REPEATABLE_READ)
public BigDecimal calculateDailyTotal(LocalDate date) {
    // All SELECTs see the same snapshot
    List<Transaction> txs = txRepo.findByDate(date);
    int count = txRepo.countByDate(date);  // same snapshot, count is consistent
    // ...
}

Result:

  • Discrepancies: from 0.05% to 0%
  • Throughput: no change (read-only workload)
  • Latency: +0.1ms (negligible)

Monitoring and Diagnostics

PostgreSQL internal stats:

-- Check isolation level of active transactions
SELECT pid, datname, backend_xid, state,
       wait_event_type, query
FROM pg_stat_activity
WHERE state = 'active';

-- Serialization failures (PostgreSQL 14+)
SELECT datname, conflicts, deadlocks
FROM pg_stat_database
WHERE datname = 'your_db';

-- MVCC stats: dead tuples
SELECT
    schemaname, relname,
    n_dead_tup, n_live_tup,
    ROUND(100.0 * n_dead_tup / NULLIF(n_dead_tup + n_live_tup, 0), 2) AS dead_pct
FROM pg_stat_user_tables
ORDER BY dead_pct DESC;

-- Long-running transactions (snapshot retention risk)
SELECT pid, now() - xact_start AS duration, query
FROM pg_stat_activity
WHERE state = 'active' AND xact_start < now() - interval '5 minutes'
ORDER BY xact_start;

Application-level monitoring:

// Micrometer: track isolation level usage
Counter.builder("db.transaction.isolation")
    .tag("level", "read_committed")
    .tag("operation", "daily_report")
    .register(meterRegistry);

// Track serialization failures
Counter.builder("db.transaction.serialization_failure")
    .tag("isolation", "serializable")
    .register(meterRegistry);

Best Practices for Highload

  1. Stay on Read Committed by default — it’s the optimal choice for 90% of workloads.
  2. Use Repeatable Read selectively — only for reports and batch jobs requiring a consistent snapshot.
  3. Avoid Serializable for write-heavy workloads — abort rate makes it unsuitable with >50 concurrent writers.
  4. Atomic SQL instead of read-modify-writeUPDATE SET balance = balance - ? eliminates lost update without raising the isolation level.
  5. Shorten transactions — the shorter the transaction, the less dead tuple retention and snapshot overhead.
  6. Configure autovacuum aggressively for high-update tables:
    ALTER TABLE accounts SET (
      autovacuum_vacuum_threshold = 50,
      autovacuum_vacuum_scale_factor = 0.05
    );
    
  7. Monitor snapshot age — alert on transactions older than 5 minutes.
  8. Connection pooling with prepared statements — reduces snapshot creation overhead per query.

🎯 Interview Cheat Sheet

Must know:

  • PostgreSQL default: Read Committed — each SELECT sees its own snapshot of committed data
  • Read Committed prohibits dirty reads but allows non-repeatable and phantom reads
  • PostgreSQL MVCC: xmin/xmax on every row, dead tuples → VACUUM cleanup
  • On RC, two SELECTs in one transaction may return different data (per-statement snapshot)
  • For consistent reports: Repeatable Read (one snapshot per transaction)
  • Serializable in PG via SSI — throughput 2-10x lower, retry logic needed

Common follow-up questions:

  • How does Read Committed in PG differ from RC in other DBMS? — Always MVCC, no locking-based RC
  • When should you change the default? — Financial calculations (RR), data migrations (RR), financial systems (Serializable)
  • What is snapshot too old? — With long RR transactions, AutoVacuum removes old versions → error
  • How to monitor MVCC in PG? — pg_stat_user_tables.n_dead_tup, long-running transactions via pg_stat_activity

Red flags (DO NOT say):

  • “PostgreSQL has Read Uncommitted” — automatic elevation to Read Committed
  • “On RC, all SELECTs in a transaction return the same result” — per-statement snapshot
  • “Serializable = just add the annotation” — retry logic and abort rate monitoring needed

Related topics:

  • [[4. What is Read Committed]]
  • [[5. What is Repeatable Read]]
  • [[12. What is the default isolation level in MySQL]]
  • [[6. What is Serializable]]