Question 3 · Section 9

What is visibility problem?

Each thread may store copies of variables in:

Language versions: English Russian Ukrainian

Junior Level

Basic Understanding

Visibility Problem is a situation where one thread modifies a variable, but another thread continues to see the old value.

Why: each CPU core has its own cache (L1/L2). A thread reads a variable from its core’s cache. When another thread on a different core changes that variable, the update initially stays in that core’s cache and is not immediately visible to everyone.

Simple Example

public class VisibilityDemo {
    private boolean flag = false; // WITHOUT volatile

    public void setFlag() {
        flag = true; // Thread 1 modifies
    }

    public void checkFlag() {
        while (!flag) {
            // Thread 2 may NEVER exit the loop!
            // It sees its local copy flag = false
        }
        System.out.println("Flag is true");
    }
}

Why does this happen?

Each thread may store copies of variables in:

  • CPU registers — fastest, but completely local
  • CPU cache (L1/L2/L3) — fast, but not always synchronized
  • Main memory (RAM) — shared, but slow

Solution

Use volatile:

public class VisibilityDemo {
    private volatile boolean flag = false; // With volatile — always sees the latest. volatile inserts a memory barrier: on write — flushes the cache line to RAM (Store barrier), on read — invalidates the cache line and re-reads from RAM (Load barrier).

    public void setFlag() {
        flag = true; // All threads will see this value immediately
    }

    public void checkFlag() {
        while (!flag) {
            // Now thread 2 is guaranteed to see the change
        }
        System.out.println("Flag is true");
    }
}

When does the visibility problem occur?

Situation Occurs? Solution
Regular variable, multiple threads Yes volatile / synchronized
volatile variable No
synchronized block No
final field (after constructor) No
AtomicInteger No

Middle Level

Physical Cause (Hardware Level)

Modern processors have a complex memory hierarchy:

CPU Registers → L1 Cache (32-64KB) → L2 Cache (256KB-1MB) → L3 Cache (8-64MB) → RAM
   ~1 cycle        ~4 cycles             ~10 cycles            ~40 cycles         ~100 cycles

What happens on write

  1. Thread on Core 1 changes the variable → writes to Store Buffer

Store Buffer — CPU write buffer. When a core writes data, it first goes to the Store Buffer, not directly to cache/RAM. This speeds up writes (the processor doesn’t wait), but creates the visibility problem.

  1. Data goes to Core 1’s L1 cache
  2. Core 2 reads the variable → takes it from its own L1 cache (old value!)
  3. The MESI protocol synchronizes caches, but this happens asynchronously

MESI Protocol

MESI Protocol (Modified, Exclusive, Shared, Invalid) — a cache coherency protocol for multi-core CPUs. When a core writes data (Modified), it sends a signal to other cores — their copies become Invalid and must be re-read from RAM.

State Meaning
Modified (M) Data modified in this cache, needs to be flushed to RAM
Exclusive (E) Data only in this cache, identical to RAM
Shared (S) Data exists in multiple caches, all identical to RAM
Invalid (I) Data is stale (someone else modified it)

When a thread writes to a volatile, it signals all cores to invalidate their copies.

JIT Optimizations That Worsen the Problem

Hoisting

Hoisting — the JIT compiler hoists the variable read out of the loop into a register, avoiding reading from RAM every iteration. Without volatile, the JIT “doesn’t know” that another thread may change the variable.

// Source code:
while (!flag) {
    doSomething(); // flag doesn't change inside the loop
}

// JIT may optimize to:
if (!flag) {
    while (true) {
        doSomething(); // Infinite loop, even if flag changes!
    }
}

Register Allocation

A variable may be cached in a CPU register, which is not part of the cache coherency system at all.

Ways to Solve

1. volatile

volatile boolean flag = false;

Guarantees:

  • Read always from main memory
  • Write always to main memory
  • Prevents reordering (memory barriers)

2. synchronized

synchronized(lock) {
    while (!flag) {
        // ...
    }
}

On entering synchronized — cache is invalidated On exiting — data is flushed to RAM

3. final fields

public class Config {
    public final String value; // Visibility guarantee after constructor

    public Config(String value) {
        this.value = value;
    }
}

4. Atomic classes

AtomicBoolean flag = new AtomicBoolean(false);
// Internally: volatile + CAS

When volatile does NOT solve the visibility problem

  1. Compound operations: volatile count++ — read and write are not atomic
  2. Groups of variables: volatile on x and y does not guarantee another thread sees a consistent pair (x, y)
  3. Dependent computations: result depends on multiple variables, each volatile — but their combination may be inconsistent

Senior Level

Under the Hood: Cache Coherence Protocol

At the x86 architecture level, the MESIF protocol (extended MESI) is used:

Event Action
Write to cache line Transition to Modified, send Invalidate to all other cores
Read cache line in Shared Can read without restrictions
Read cache line in Invalid Request data from RAM or another cache
Write to Modified cache line Local write, other cores not notified until conflict

Bus Traffic and Cache Invalidation

When core 1 writes to a volatile variable:

  1. An Invalidate message is sent on the bus (Ring Bus / Mesh)
  2. All other cores check their caches
  3. Cores with that cache line respond with Ack and transition to Invalid
  4. Only after all Acks is the write considered complete

This takes hundreds of CPU cycles — which is why volatile write is more expensive than read.

False Sharing — A Performance Problem

public class FalseSharingDemo {
    // Both volatile in the same cache line (64 bytes)
    public volatile long counter1 = 0;
    public volatile long counter2 = 0;
}

When thread 1 writes to counter1, the cache line is invalidated for thread 2, even if it’s working with counter2.

Solution: @Contended

public class ContendedDemo {
    @Contended
    public volatile long counter1 = 0;

    @Contended
    public volatile long counter2 = 0;
}

@Contended adds padding (128 bytes) around the variable so it occupies a separate cache line.

Requires -XX:-RestrictContended for use in user code (Java 8+).

Manual Padding (for older Java)

public class PaddedCounter {
    // Padding before the variable
    public long p1, p2, p3, p4, p5, p6, p7;
    public volatile long value = 0;
    // Padding after the variable
    public long q1, q2, q3, q4, q5, q6, q7;
}

Performance and Highload

Benchmark (approximate)

Operation Time Note
Regular read ~1 ns From cache
Volatile read ~5-10 ns With barriers
Regular write ~1 ns To cache
Volatile write ~50-100 ns With Invalidate
synchronized (no contention) ~10-20 ns Thin lock
synchronized (contention) ~1000+ ns Context switch

Write Barriers

Using volatile slows down writes more than reads:

  • Read: only barrier (LoadLoad + LoadStore)
  • Write: barrier + invalidate other caches + wait for Ack

Diagnostics

-XX:+PrintAssembly

Lets you see actual CPU instructions:

java -XX:+PrintAssembly -XX:+UnlockDiagnosticVMOptions

Look for the lock instruction before a write — that’s the Memory Barrier on x86:

lock or dword ptr [rsp], 0  # StoreLoad barrier
mov [rax], 1                # volatile write

Flaky Tests

Visibility issues are the main cause of tests that:

  • Pass locally (1-2 cores)
  • Fail on CI server (8+ cores)
  • Fail “sometimes” (race condition timing)

Java Memory Model Stress Testing

@JCStressTest
public class VisibilityStressTest {
    int x = 0;
    volatile boolean ready = false;

    @Actor
    public void writer() {
        x = 42;
        ready = true;
    }

    @Actor
    public void reader(IntResult2 r) {
        if (ready) {
            r.r1 = x; // Should always be 42
        }
    }
}

Best Practices

  1. Always use volatile for flags read from different threads
  2. Avoid storing shared state in regular fields
  3. final fields — a free way to ensure visibility of immutable data
  4. Beware of False Sharing for frequently updated volatile fields
  5. Test on multiprocessor systems, not just locally
  6. Use JCStress for stress-testing multithreaded code

Interview Cheat Sheet

Must know:

  • Visibility problem: one thread changes a variable, another sees the old value from the core’s cache
  • Cause: memory hierarchy (registers → L1/L2/L3 cache → RAM), each core has its own cache
  • MESI protocol (Modified, Exclusive, Shared, Invalid) ensures cache coherency
  • JIT hoisting optimization can hoist a read out of a loop → infinite loop without volatile
  • 4 solutions: volatile, synchronized, final fields, Atomic classes
  • Volatile does NOT solve the problem for compound operations and groups of variables
  • False Sharing: two volatile variables in the same cache line — writing to one invalidates the other

Frequent follow-up questions:

  • Why does a test pass locally but fail on CI? — Locally 1-2 cores (fewer cache issues), CI — 8+ cores
  • What is a Store Buffer? — CPU write buffer; writes go there first, not to RAM — the cause of the visibility problem
  • How does JIT worsen the problem? — Hoisting: hoists variable reads out of loops into registers, ignoring changes from other threads
  • What does @Contended do? — Adds padding (128 bytes) around volatile to avoid false sharing

Red flags (do NOT say):

  • “Volatile solves all multithreading problems” — no, only visibility of a single variable
  • “CPU cache is always synchronized” — no, MESI works asynchronously
  • “Synchronized is slower than volatile for a simple flag” — no, volatile is cheaper for simple cases
  • “Final fields have nothing to do with visibility” — they do: finalized fields are visible after the constructor

Related topics:

  • [[1. What is the difference between synchronized and volatile]] — volatile as a solution to visibility problem
  • [[2. What is happens-before relationship]] — JMM-level visibility guarantees
  • [[4. What is monitor in Java]] — synchronized and cache invalidation on entry
  • [[8. What are Atomic classes]] — Atomic classes also solve the visibility problem