What is visibility problem?
Each thread may store copies of variables in:
Junior Level
Basic Understanding
Visibility Problem is a situation where one thread modifies a variable, but another thread continues to see the old value.
Why: each CPU core has its own cache (L1/L2). A thread reads a variable from its core’s cache. When another thread on a different core changes that variable, the update initially stays in that core’s cache and is not immediately visible to everyone.
Simple Example
public class VisibilityDemo {
private boolean flag = false; // WITHOUT volatile
public void setFlag() {
flag = true; // Thread 1 modifies
}
public void checkFlag() {
while (!flag) {
// Thread 2 may NEVER exit the loop!
// It sees its local copy flag = false
}
System.out.println("Flag is true");
}
}
Why does this happen?
Each thread may store copies of variables in:
- CPU registers — fastest, but completely local
- CPU cache (L1/L2/L3) — fast, but not always synchronized
- Main memory (RAM) — shared, but slow
Solution
Use volatile:
public class VisibilityDemo {
private volatile boolean flag = false; // With volatile — always sees the latest. volatile inserts a memory barrier: on write — flushes the cache line to RAM (Store barrier), on read — invalidates the cache line and re-reads from RAM (Load barrier).
public void setFlag() {
flag = true; // All threads will see this value immediately
}
public void checkFlag() {
while (!flag) {
// Now thread 2 is guaranteed to see the change
}
System.out.println("Flag is true");
}
}
When does the visibility problem occur?
| Situation | Occurs? | Solution |
|---|---|---|
| Regular variable, multiple threads | Yes | volatile / synchronized |
volatile variable |
No | — |
synchronized block |
No | — |
final field (after constructor) |
No | — |
AtomicInteger |
No | — |
Middle Level
Physical Cause (Hardware Level)
Modern processors have a complex memory hierarchy:
CPU Registers → L1 Cache (32-64KB) → L2 Cache (256KB-1MB) → L3 Cache (8-64MB) → RAM
~1 cycle ~4 cycles ~10 cycles ~40 cycles ~100 cycles
What happens on write
- Thread on Core 1 changes the variable → writes to Store Buffer
Store Buffer — CPU write buffer. When a core writes data, it first goes to the Store Buffer, not directly to cache/RAM. This speeds up writes (the processor doesn’t wait), but creates the visibility problem.
- Data goes to Core 1’s L1 cache
- Core 2 reads the variable → takes it from its own L1 cache (old value!)
- The MESI protocol synchronizes caches, but this happens asynchronously
MESI Protocol
MESI Protocol (Modified, Exclusive, Shared, Invalid) — a cache coherency protocol for multi-core CPUs. When a core writes data (Modified), it sends a signal to other cores — their copies become Invalid and must be re-read from RAM.
| State | Meaning |
|---|---|
| Modified (M) | Data modified in this cache, needs to be flushed to RAM |
| Exclusive (E) | Data only in this cache, identical to RAM |
| Shared (S) | Data exists in multiple caches, all identical to RAM |
| Invalid (I) | Data is stale (someone else modified it) |
When a thread writes to a volatile, it signals all cores to invalidate their copies.
JIT Optimizations That Worsen the Problem
Hoisting
Hoisting — the JIT compiler hoists the variable read out of the loop into a register, avoiding reading from RAM every iteration. Without volatile, the JIT “doesn’t know” that another thread may change the variable.
// Source code:
while (!flag) {
doSomething(); // flag doesn't change inside the loop
}
// JIT may optimize to:
if (!flag) {
while (true) {
doSomething(); // Infinite loop, even if flag changes!
}
}
Register Allocation
A variable may be cached in a CPU register, which is not part of the cache coherency system at all.
Ways to Solve
1. volatile
volatile boolean flag = false;
Guarantees:
- Read always from main memory
- Write always to main memory
- Prevents reordering (memory barriers)
2. synchronized
synchronized(lock) {
while (!flag) {
// ...
}
}
On entering synchronized — cache is invalidated On exiting — data is flushed to RAM
3. final fields
public class Config {
public final String value; // Visibility guarantee after constructor
public Config(String value) {
this.value = value;
}
}
4. Atomic classes
AtomicBoolean flag = new AtomicBoolean(false);
// Internally: volatile + CAS
When volatile does NOT solve the visibility problem
- Compound operations:
volatile count++— read and write are not atomic - Groups of variables: volatile on
xandydoes not guarantee another thread sees a consistent pair (x, y) - Dependent computations: result depends on multiple variables, each volatile — but their combination may be inconsistent
Senior Level
Under the Hood: Cache Coherence Protocol
At the x86 architecture level, the MESIF protocol (extended MESI) is used:
| Event | Action |
|---|---|
| Write to cache line | Transition to Modified, send Invalidate to all other cores |
| Read cache line in Shared | Can read without restrictions |
| Read cache line in Invalid | Request data from RAM or another cache |
| Write to Modified cache line | Local write, other cores not notified until conflict |
Bus Traffic and Cache Invalidation
When core 1 writes to a volatile variable:
- An Invalidate message is sent on the bus (Ring Bus / Mesh)
- All other cores check their caches
- Cores with that cache line respond with Ack and transition to Invalid
- Only after all Acks is the write considered complete
This takes hundreds of CPU cycles — which is why volatile write is more expensive than read.
False Sharing — A Performance Problem
public class FalseSharingDemo {
// Both volatile in the same cache line (64 bytes)
public volatile long counter1 = 0;
public volatile long counter2 = 0;
}
When thread 1 writes to counter1, the cache line is invalidated for thread 2, even if it’s working with counter2.
Solution: @Contended
public class ContendedDemo {
@Contended
public volatile long counter1 = 0;
@Contended
public volatile long counter2 = 0;
}
@Contended adds padding (128 bytes) around the variable so it occupies a separate cache line.
Requires -XX:-RestrictContended for use in user code (Java 8+).
Manual Padding (for older Java)
public class PaddedCounter {
// Padding before the variable
public long p1, p2, p3, p4, p5, p6, p7;
public volatile long value = 0;
// Padding after the variable
public long q1, q2, q3, q4, q5, q6, q7;
}
Performance and Highload
Benchmark (approximate)
| Operation | Time | Note |
|---|---|---|
| Regular read | ~1 ns | From cache |
| Volatile read | ~5-10 ns | With barriers |
| Regular write | ~1 ns | To cache |
| Volatile write | ~50-100 ns | With Invalidate |
| synchronized (no contention) | ~10-20 ns | Thin lock |
| synchronized (contention) | ~1000+ ns | Context switch |
Write Barriers
Using volatile slows down writes more than reads:
- Read: only barrier (LoadLoad + LoadStore)
- Write: barrier + invalidate other caches + wait for Ack
Diagnostics
-XX:+PrintAssembly
Lets you see actual CPU instructions:
java -XX:+PrintAssembly -XX:+UnlockDiagnosticVMOptions
Look for the lock instruction before a write — that’s the Memory Barrier on x86:
lock or dword ptr [rsp], 0 # StoreLoad barrier
mov [rax], 1 # volatile write
Flaky Tests
Visibility issues are the main cause of tests that:
- Pass locally (1-2 cores)
- Fail on CI server (8+ cores)
- Fail “sometimes” (race condition timing)
Java Memory Model Stress Testing
@JCStressTest
public class VisibilityStressTest {
int x = 0;
volatile boolean ready = false;
@Actor
public void writer() {
x = 42;
ready = true;
}
@Actor
public void reader(IntResult2 r) {
if (ready) {
r.r1 = x; // Should always be 42
}
}
}
Best Practices
- Always use
volatilefor flags read from different threads - Avoid storing shared state in regular fields
finalfields — a free way to ensure visibility of immutable data- Beware of False Sharing for frequently updated volatile fields
- Test on multiprocessor systems, not just locally
- Use JCStress for stress-testing multithreaded code
Interview Cheat Sheet
Must know:
- Visibility problem: one thread changes a variable, another sees the old value from the core’s cache
- Cause: memory hierarchy (registers → L1/L2/L3 cache → RAM), each core has its own cache
- MESI protocol (Modified, Exclusive, Shared, Invalid) ensures cache coherency
- JIT hoisting optimization can hoist a read out of a loop → infinite loop without volatile
- 4 solutions: volatile, synchronized, final fields, Atomic classes
- Volatile does NOT solve the problem for compound operations and groups of variables
- False Sharing: two volatile variables in the same cache line — writing to one invalidates the other
Frequent follow-up questions:
- Why does a test pass locally but fail on CI? — Locally 1-2 cores (fewer cache issues), CI — 8+ cores
- What is a Store Buffer? — CPU write buffer; writes go there first, not to RAM — the cause of the visibility problem
- How does JIT worsen the problem? — Hoisting: hoists variable reads out of loops into registers, ignoring changes from other threads
- What does @Contended do? — Adds padding (128 bytes) around volatile to avoid false sharing
Red flags (do NOT say):
- “Volatile solves all multithreading problems” — no, only visibility of a single variable
- “CPU cache is always synchronized” — no, MESI works asynchronously
- “Synchronized is slower than volatile for a simple flag” — no, volatile is cheaper for simple cases
- “Final fields have nothing to do with visibility” — they do: finalized fields are visible after the constructor
Related topics:
- [[1. What is the difference between synchronized and volatile]] — volatile as a solution to visibility problem
- [[2. What is happens-before relationship]] — JMM-level visibility guarantees
- [[4. What is monitor in Java]] — synchronized and cache invalidation on entry
- [[8. What are Atomic classes]] — Atomic classes also solve the visibility problem