What Is the Default Isolation Level in PostgreSQL
The default isolation level in PostgreSQL is Read Committed.
🟢 Junior Level
The default isolation level in PostgreSQL is Read Committed.
What this means in simple terms: Each SELECT command within a transaction sees only data that has already been saved (committed) by other transactions. Important: each SELECT sees its own snapshot. Therefore, two SELECTs in the same transaction may return different data. But if you run SELECT twice within one transaction, the second query may see new data that appeared after the first query.
Simple analogy: You’re reading a news feed. Each time you refresh the page (new SELECT), you see fresh news. But you don’t see drafts that haven’t been published yet (uncommitted data).
SQL example:
-- Initial state: account.balance = 1000
-- Transaction 1
BEGIN;
SELECT balance FROM accounts WHERE id = 1; -- sees 1000
-- (waits...)
-- Transaction 2 (runs in parallel)
BEGIN;
UPDATE accounts SET balance = 1500 WHERE id = 1;
COMMIT; -- data saved
-- Transaction 1 continues:
SELECT balance FROM accounts WHERE id = 1; -- sees 1500! (new data)
COMMIT;
What is protected and what isn’t on Read Committed:
| Anomaly | Possible? | Description |
|---|---|---|
| Dirty Read | No | Doesn’t see uncommitted data |
| Non-repeatable Read | Yes | Repeated SELECT may return different values |
| Phantom Read | Yes | Repeated SELECT may return different rows |
How to change the isolation level:
-- For the current session
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
-- In Spring
@Transactional(isolation = Isolation.REPEATABLE_READ)
🟡 Middle Level
How It Works Internally
PostgreSQL implements Read Committed through MVCC (Multi-Version Concurrency Control):
Terms:
- xmin/xmax — system fields in each PG row, storing the authoring transaction ID.
- Dead tuples — old row versions no longer needed by any active transaction.
- TOAST — PG mechanism for storing large values (TEXT, JSONB) separately from the main row.
- Autovacuum — PG background process that removes dead tuples.
Each row in PostgreSQL contains two system fields:
- xmin: ID of the transaction that inserted this row
- xmax: ID of the transaction that deleted this row (0 = row is alive)
On SELECT:
1. PostgreSQL takes the current snapshot
2. Checks: xmin < current_transaction_id and the row is "visible"
3. Each SELECT command gets a NEW snapshot
Key point: Unlike some other DBMS, PostgreSQL at the Read Committed level creates a new snapshot for each command, not for the entire transaction. Therefore, a repeated SELECT may see data committed by other transactions after the first SELECT.
Practical Application
When Read Committed Is Sufficient:
- CRUD applications — most web applications work correctly on Read Committed
- Reports without strict consistency requirements — small margin of error is acceptable
- High-load systems — minimal overhead, maximum throughput
When Read Committed Is NOT Sufficient:
- Financial calculations — consistent data is needed throughout the entire transaction
- Data migrations — a stable snapshot is required
- Complex business invariants — checks depending on multiple tables
Real Scenarios
Scenario 1: Repeated Read Problem on Read Committed
@Transactional // Read Committed by default
public void transfer(Long fromId, Long toId, BigDecimal amount) {
// Snapshot 1
Account from = accountRepo.findById(fromId);
Account to = accountRepo.findById(toId);
// Here another transaction may change the 'from' balance
// Snapshot 2 — may see different data!
BigDecimal fromBalance = from.getBalance();
BigDecimal toBalance = to.getBalance();
// If data changed, the transfer may be incorrect
from.setBalance(fromBalance.subtract(amount));
to.setBalance(toBalance.add(amount));
}
Scenario 2: Consistent Report on Repeatable Read
@Transactional(isolation = Isolation.REPEATABLE_READ)
public Report generateReport(LocalDate date) {
// All SELECTs see the same snapshot
int count = txRepo.countByDate(date);
BigDecimal sum = txRepo.sumByDate(date);
// count and sum are guaranteed to be consistent
return new Report(count, sum);
}
Common Mistakes
| Mistake | Consequence | Solution |
|---|---|---|
Assuming @Transactional gives a consistent snapshot |
Repeated SELECT sees different data | Use Repeatable Read for consistent reports |
| Long transaction on Read Committed | Larger window for conflicts | Shorten the transaction |
| Read Committed for financial operations | Lost Update, incorrect calculations | Atomic SQL (SET balance = balance - ?) or Repeatable Read + FOR UPDATE |
Ignoring default_transaction_isolation |
Different sessions may have different levels | Set the level explicitly in code or configuration |
Comparison of Isolation Levels in PostgreSQL
| Level | Dirty Read | Non-repeatable Read | Phantom Read | Throughput (relative) |
|---|---|---|---|---|
| Read Uncommitted | Possible | Possible | Possible | 1.0x |
| Read Committed (default) | No | Yes | Yes | 1.0x |
| Repeatable Read | No | No | No (in PG) | 0.95x |
| Serializable | No | No | No | 0.3-0.5x |
Numbers are relative for read-heavy workload (80% SELECT). For write-heavy (80% UPDATE) the difference will be larger: RC=1.0x, RR=0.8x, Serializable=0.2x.
Important: In PostgreSQL, Repeatable Read also protects against phantom reads (thanks to MVCC), which goes beyond the ANSI SQL standard.
When NOT to Change the Default
- Standard CRUD operations — Read Committed is a good default choice for standard CRUD operations.
- Microservices with eventual consistency — strict isolation is excessive
- High-load read-heavy systems — Serializability overhead is not justified
🔴 Senior Level
Internal Implementation: MVCC and Snapshot Management
Transaction ID and Visibility Rules
// Simplified visibility logic in PostgreSQL (src/backend/access/common/heaptuple.c)
bool HeapTupleSatisfiesVisibility(HeapTuple htup, Snapshot snapshot) {
TransactionId xmin = htup->t_data->t_xmin; // inserting transaction
TransactionId xmax = htup->t_data->t_xmax; // deleting transaction
// Read Committed rules:
// 1. If xmin hasn't committed yet → row is invisible
// 2. If xmax committed → row is invisible (deleted)
// 3. snapshot is created anew for EACH query
// Repeatable Read rules:
// snapshot is created ONCE at the first query in the transaction
// and reused for all subsequent queries
}
Transaction ID wraparound:
- PostgreSQL uses 32-bit transaction IDs (~4 billion)
- After overflow, wraparound occurs
- Autovacuum must “freeze” old rows (
xmax = FrozenTransactionId) - If autovacuum can’t keep up → database shutdown (transaction ID wraparound failure)
Snapshot Creation Overhead
Read Committed:
- New snapshot for each query
- Snapshot = array of in-progress transaction IDs
- Size: O(active_transactions)
- Allocation: ~100-500 bytes per snapshot
Repeatable Read / Serializable:
- Snapshot created once at the first query
- Preserved until the end of the transaction
- Additionally: SSI serializable transaction tracking (~1KB overhead)
MVCC and Dead Tuples
Each UPDATE or DELETE doesn’t physically remove the row, but marks it as dead:
UPDATE accounts SET balance = 1500 WHERE id = 1;
Old row version:
xmin = 100, xmax = 200 (new transaction), balance = 1000 → DEAD TUPLE
New row version:
xmin = 200, xmax = 0, balance = 1500 → LIVE TUPLE
Overhead:
- Dead tuples take up disk space (table bloat: 2-10x growth)
- Indexes also accumulate dead entries
- AutoVacuum must periodically clean up (I/O overhead: 10-30% disk throughput)
Architectural Trade-offs
Approach A: Read Committed (default)
- ✅ Pros: Minimal overhead, simple semantics, no serialization failures, maximum throughput
- ❌ Cons: Non-repeatable reads, phantom reads, can’t rely on consistency within long transactions
- Suitable for: 90% of web applications, microservices, CRUD
Approach B: Repeatable Read
- ✅ Pros: Consistent snapshot, phantom protection (in PG), predictable behavior
- ❌ Cons: Snapshot-too-old errors on long-running transactions, more dead tuples
- Suitable for: reports, migrations, batch processing
Approach C: Serializable (SSI)
- ✅ Pros: Full ACID, guarantee of correctness for any concurrent transactions
- ❌ Cons: Serialization failures (10-30% abort rate), predicate lock overhead (5-20MB shared memory), 2-10x lower throughput
- Suitable for: financial systems where correctness is more important than performance
Edge Cases and Corner Cases
1. Snapshot Too Old (Repeatable Read / Serializable):
T1: BEGIN ISOLATION LEVEL REPEATABLE READ;
T1: SELECT * FROM accounts; -- creates snapshot
// ... 10 minutes pass ...
// AutoVacuum removes dead tuples, including versions visible only to T1's snapshot
T1: SELECT * FROM accounts;
-- ERROR: could not serialize access due to concurrent update
-- or: ERROR: snapshot too old
Solution: Increase vacuum_freeze_min_age or shorten transactions.
2. Read Committed and UPDATE Contention:
-- T1 and T2 simultaneously:
T1: UPDATE accounts SET balance = balance - 100 WHERE id = 1;
T2: UPDATE accounts SET balance = balance - 100 WHERE id = 1;
-- PostgreSQL: T2 waits for T1, then T2 restarts UPDATE with a new snapshot
-- T2 sees T1's result and subtracts another 100
-- Result: balance -= 200 (correct!)
-- BUT if the logic is at the application level (read-modify-write):
-- T1: SELECT balance → 1000
-- T2: SELECT balance → 1000
-- T1: UPDATE balance = 900
-- T2: UPDATE balance = 900 ← Lost Update!
3. DDL Inside Transaction on Read Committed:
BEGIN;
ALTER TABLE accounts ADD COLUMN new_col TEXT; -- DDL
SELECT * FROM accounts; -- new snapshot sees the new column
-- Other sessions don't see new_col yet until COMMIT
-- But there may be blocking!
4. Read Committed and Foreign Keys:
-- T1: INSERT INTO orders (user_id) VALUES (999);
-- T2: DELETE FROM users WHERE id = 999;
-- FK check in T1 verifies existence of user_id = 999
-- On Read Committed, FK check creates a separate snapshot
-- If T2 already committed the DELETE → FK violation!
5. Read Committed and Cursor:
BEGIN;
DECLARE cur CURSOR FOR SELECT * FROM accounts;
FETCH 10 FROM cur; -- snapshot 1
-- Another transaction modifies data
FETCH 10 FROM cur; -- snapshot 2, may see different data!
-- Cursor does NOT guarantee consistency on Read Committed
Performance Implications
| Level | Latency (p50) | Latency (p99) | Throughput | Memory Overhead |
|---|---|---|---|---|
| Read Committed | 1-3ms | 5-15ms | 50,000+ TPS | Snapshot: ~200 bytes/query |
| Repeatable Read | 1-3ms | 5-20ms | 48,000 TPS | Snapshot: ~200 bytes/tx + hold dead tuples |
| Serializable | 5-20ms | 50-200ms | 8,000-15,000 TPS | Predicate locks: 5-20MB shared memory |
Specific numbers (PostgreSQL 15, 8 cores, NVMe, pgbench):
Numbers are for a specific setup. On your hardware/version they may differ by an order of magnitude.
- Read Committed: ~55,000 SELECT-only TPS
- Repeatable Read: ~52,000 SELECT-only TPS (5% overhead from snapshot retention)
- Serializable: ~12,000 TPS with moderate contention (78% reduction)
Write workload (50% reads, 50% writes):
- Read Committed: ~25,000 TPS
- Repeatable Read: ~24,000 TPS
- Serializable: ~6,000 TPS (serialization failures + retries)
Memory Implications
- Snapshot arrays: ~200 bytes per snapshot. At 1000 queries/sec = 200KB/s allocation.
- Dead tuple retention (RR/SRL): With long-running transactions, dead tuples aren’t vacuumed. A 1GB table may grow to 5-10GB.
- Predicate locks (Serializable): Shared memory pool (~64MB default). On overflow → lock coarsening → more serialization failures.
- TOAST bloat: Large fields (TEXT, JSONB) are stored in TOAST tables. Dead TOAST tuples also accumulate.
Concurrency Aspects
Serialization failure patterns (SSI):
T1: reads range A (predicate lock on A)
T2: writes to range A
T1: writes to range B
T2: reads range B (predicate lock on B)
Dependency graph:
T1 → (read A) → T2 → (write A) = RW-dependency
T2 → (read B) → T1 → (write B) = RW-dependency
Cycle: T1 → T2 → T1 → serialization failure
Abort rate under contention:
- 10 concurrent writers: ~5% serialization failures
- 50 concurrent writers: ~15-25% serialization failures
- 100+ concurrent writers: ~30-50% (unsuitable for production)
Real Production Scenario
Situation: Payment system (2024), PostgreSQL 14, processing 5000 transactions/sec.
Problem: Daily balance reconciliation showed discrepancies of 0.01-0.05%. With a daily turnover of $10M, this meant $1,000-5,000 “lost” dollars.
Root cause: Batch job on Read Committed:
@Transactional // Read Committed
public BigDecimal calculateDailyTotal(LocalDate date) {
// Snapshot 1: 10,000 transactions
List<Transaction> txs = txRepo.findByDate(date);
// While calculating, the payment gateway confirms new transactions
// Snapshot 2: new committed rows
BigDecimal total = txs.stream()
.map(Transaction::getAmount)
.reduce(BigDecimal.ZERO, BigDecimal::add);
// Another query within the same transaction:
int count = txRepo.countByDate(date); // Snapshot 3 — sees more rows!
// total calculated from 10,000, count = 10,050
// Discrepancy in the report
}
Solution: Switch to Repeatable Read:
@Transactional(isolation = Isolation.REPEATABLE_READ)
public BigDecimal calculateDailyTotal(LocalDate date) {
// All SELECTs see the same snapshot
List<Transaction> txs = txRepo.findByDate(date);
int count = txRepo.countByDate(date); // same snapshot, count is consistent
// ...
}
Result:
- Discrepancies: from 0.05% to 0%
- Throughput: no change (read-only workload)
- Latency: +0.1ms (negligible)
Monitoring and Diagnostics
PostgreSQL internal stats:
-- Check isolation level of active transactions
SELECT pid, datname, backend_xid, state,
wait_event_type, query
FROM pg_stat_activity
WHERE state = 'active';
-- Serialization failures (PostgreSQL 14+)
SELECT datname, conflicts, deadlocks
FROM pg_stat_database
WHERE datname = 'your_db';
-- MVCC stats: dead tuples
SELECT
schemaname, relname,
n_dead_tup, n_live_tup,
ROUND(100.0 * n_dead_tup / NULLIF(n_dead_tup + n_live_tup, 0), 2) AS dead_pct
FROM pg_stat_user_tables
ORDER BY dead_pct DESC;
-- Long-running transactions (snapshot retention risk)
SELECT pid, now() - xact_start AS duration, query
FROM pg_stat_activity
WHERE state = 'active' AND xact_start < now() - interval '5 minutes'
ORDER BY xact_start;
Application-level monitoring:
// Micrometer: track isolation level usage
Counter.builder("db.transaction.isolation")
.tag("level", "read_committed")
.tag("operation", "daily_report")
.register(meterRegistry);
// Track serialization failures
Counter.builder("db.transaction.serialization_failure")
.tag("isolation", "serializable")
.register(meterRegistry);
Best Practices for Highload
- Stay on Read Committed by default — it’s the optimal choice for 90% of workloads.
- Use Repeatable Read selectively — only for reports and batch jobs requiring a consistent snapshot.
- Avoid Serializable for write-heavy workloads — abort rate makes it unsuitable with >50 concurrent writers.
- Atomic SQL instead of read-modify-write —
UPDATE SET balance = balance - ?eliminates lost update without raising the isolation level. - Shorten transactions — the shorter the transaction, the less dead tuple retention and snapshot overhead.
- Configure autovacuum aggressively for high-update tables:
ALTER TABLE accounts SET ( autovacuum_vacuum_threshold = 50, autovacuum_vacuum_scale_factor = 0.05 ); - Monitor snapshot age — alert on transactions older than 5 minutes.
- Connection pooling with prepared statements — reduces snapshot creation overhead per query.
🎯 Interview Cheat Sheet
Must know:
- PostgreSQL default: Read Committed — each SELECT sees its own snapshot of committed data
- Read Committed prohibits dirty reads but allows non-repeatable and phantom reads
- PostgreSQL MVCC: xmin/xmax on every row, dead tuples → VACUUM cleanup
- On RC, two SELECTs in one transaction may return different data (per-statement snapshot)
- For consistent reports: Repeatable Read (one snapshot per transaction)
- Serializable in PG via SSI — throughput 2-10x lower, retry logic needed
Common follow-up questions:
- How does Read Committed in PG differ from RC in other DBMS? — Always MVCC, no locking-based RC
- When should you change the default? — Financial calculations (RR), data migrations (RR), financial systems (Serializable)
- What is snapshot too old? — With long RR transactions, AutoVacuum removes old versions → error
- How to monitor MVCC in PG? —
pg_stat_user_tables.n_dead_tup, long-running transactions viapg_stat_activity
Red flags (DO NOT say):
- “PostgreSQL has Read Uncommitted” — automatic elevation to Read Committed
- “On RC, all SELECTs in a transaction return the same result” — per-statement snapshot
- “Serializable = just add the annotation” — retry logic and abort rate monitoring needed
Related topics:
- [[4. What is Read Committed]]
- [[5. What is Repeatable Read]]
- [[12. What is the default isolation level in MySQL]]
- [[6. What is Serializable]]