What is CAS (Compare-And-Swap)?
Imagine an ATM: 4. If yes — withdraws 200, new balance 800 5. If no (someone already withdrew) — cancels the operation, need to retry
Junior Level
Basic Understanding
CAS (Compare-And-Swap) is an atomic CPU instruction that allows updating a value only if it hasn’t changed since it was read. This is the foundation of lock-free algorithms.
Simple Analogy
Imagine an ATM:
- You check the balance: 1000 rub
- You want to withdraw 200 rub (expecting 800 to remain)
- The ATM checks: “is the balance still 1000?”
- If yes — withdraws 200, new balance 800
- If no (someone already withdrew) — cancels the operation, need to retry
Key difference from synchronized: CAS is an optimistic approach (try, if it fails — retry). synchronized is a pessimistic approach (lock and do, no one will interfere).
How CAS Works
// CAS takes 3 parameters:
// 1. Memory address
// 2. Expected value
// 3. New value
boolean CAS(address, expectedValue, newValue) {
if (memory[address] == expectedValue) {
memory[address] = newValue;
return true; // Success!
}
return false; // Failed — value changed
}
Example with AtomicInteger
AtomicInteger counter = new AtomicInteger(0);
counter.incrementAndGet(); // Internally:
// 1. Read current value (current = 0)
// 2. Compute new value (next = 1)
// 3. CAS: if value == 0, set to 1
// If CAS failed — repeat from step 1
Spin-loop
public final int incrementAndGet() {
for (;;) { // Infinite loop
int current = get(); // Read
int next = current + 1; // Compute
if (compareAndSet(current, next)) { // CAS
return next; // Success!
}
// CAS failed — someone else updated the value
// Go to next iteration
}
}
When is CAS faster than locks?
| Scenario | CAS | synchronized |
|---|---|---|
| Low contention | Faster (no locking) | Slower (acquire/release) |
| Medium contention | Faster | Medium |
| High contention | Slower (infinite loops) | Faster (threads are parked) |
Middle Level
Hardware-Level Algorithm
On x86 architecture, CAS is implemented via the LOCK CMPXCHG instruction:
; rax = expected value
; rcx = new value
; [memory] = actual value in memory
lock cmpxchg [memory], rcx
; If [memory] == rax:
; [memory] = rcx ← Write new value
; ZF = 1 ← Success flag
; Else:
; rax = [memory] ← Load actual value
; ZF = 0 ← Failure flag
The lock prefix locks the memory bus for the duration of the operation — other cores cannot access this memory cell.
CAS in Java: Unsafe and VarHandle
Before Java 9: sun.misc.Unsafe
Before Java 9: sun.misc.Unsafe (direct memory access, dangerous)
Java 9+: VarHandle (safe replacement for Unsafe, JEP 193)
Note: Unsafe is still accessible in Java 9+ via --add-opens, but not recommended.
// Direct memory access (not recommended)
Unsafe unsafe = ...;
long offset = ...; // Field offset in memory
int current;
do {
current = unsafe.getIntVolatile(obj, offset);
} while (!unsafe.compareAndSwapInt(obj, offset, current, current + 1));
Java 9+: VarHandle
// Typed and safe API
private static final VarHandle VALUE;
static {
MethodHandles.Lookup l = MethodHandles.lookup();
VALUE = l.findVarHandle(MyClass.class, "value", int.class);
}
int current;
do {
current = (int) VALUE.getVolatile(this);
} while (!VALUE.compareAndSet(this, current, current + 1));
Lock-free vs Wait-free
In an interview, it’s important to distinguish:
This is a progress guarantee hierarchy:
- Wait-free (strongest): EVERY thread completes the operation in a finite number of steps
- Lock-free (middle): AT LEAST ONE thread completes the operation in a finite number of steps
- Obstruction-free (weakest): thread completes if all others are suspended
Wait-free → Lock-free → Obstruction-free: each next one is weaker than the previous.
| Term | Guarantee | Example |
|---|---|---|
| Lock-free | At least one thread completes the operation in finite time | AtomicInteger |
| Wait-free | Every thread completes in a finite number of steps | LongAdder.add() |
| Obstruction-free | Thread completes if others don’t interfere | Some STM implementations |
STM (Software Transactional Memory) — software transactional memory: operations execute in a transaction, then atomically commit. An alternative approach to lock-free programming.
Exponential Back-off
Under high contention, a simple spin cycle wastes CPU. Solution — pauses:
public int incrementWithBackoff() {
int backoff = 1;
for (;;) {
int current = get();
int next = current + 1;
if (compareAndSet(current, next)) {
return next;
}
// CAS failed — pause
Thread.onSpinWait(); // Hint to the processor
// `Thread.onSpinWait()` — a hint to the JVM/CPU: "I'm in a spin cycle, you can optimize". On x86 this is the PAUSE instruction, which reduces power consumption and improves spin cycle performance.
Thread.sleep(ThreadLocalRandom.current().nextInt(0, backoff));
backoff = Math.min(backoff * 2, MAX_BACKOFF); // Exponential growth
}
}
ABA Problem
Time Thread 1 Thread 2 Stack (head)
──────────────────────────────────────────────────
t1 reads A A → B → C
t2 CAS(A, B) B → C
t3 CAS(B, A) A → C (new value!)
t4 CAS(A, D) → SUCCESS!
But the stack changed! B and C are lost!
Solution: AtomicStampedReference
// Stores value + version number
AtomicStampedReference<String> ref = new AtomicStampedReference<>("A", 0);
int[] stampHolder = new int[1];
String value = ref.get(stampHolder);
int stamp = stampHolder[0];
// CAS only if both value AND version match
ref.compareAndSet("A", "D", stamp, stamp + 1);
When CAS is NOT suitable
- Compound operations: need to atomically update two fields — CAS works on one field
- High contention (100+ threads): CAS cycle wastes CPU, better to use LongAdder or Lock
- Long computations: if milliseconds pass between read and CAS — CAS will constantly fail
- ABA problem: value changed A→B→A, CAS “won’t notice” — need AtomicStampedReference
Senior Level
Under the Hood: Memory Ordering
CAS provides a full memory barrier (StoreLoad):
Before CAS: All writes are guaranteed to be completed
CAS: Atomic update
After CAS: All reads see the latest data
On different architectures:
| Architecture | CAS Implementation | Barriers |
|---|---|---|
| x86 | lock cmpxchg |
Full barrier (x86 is strongly ordered) |
| ARM | ldxr/stxr loop |
Requires dmb instruction |
| RISC-V | lr.w/sc.w loop |
Requires fence |
Spin Cycle and CPU Saturation
Under very high contention, threads may spin forever:
// Problem: 100 threads, 1 CAS variable
for (;;) {
int current = get();
int next = current + 1;
if (compareAndSet(current, next)) // All 100 threads constantly fail
return next;
// CPU at 100%, but no progress!
}
Solutions:
- Exponential Back-off — pause between attempts
- LongAdder — distribution across cells
- Fallback to locking — if contention is too high
Performance: CAS vs Mutex
| Metric | CAS | Mutex (synchronized) |
|---|---|---|
| Latency (uncontended) | ~5-10 ns | ~10-20 ns |
| Latency (contended) | ~100+ ns (spin) | ~1000+ ns (park) |
| CPU usage (contended) | 100% (spin-loop) | ~0% (parked) |
| Throughput (low contention) | Higher | Lower |
| Throughput (high contention) | Lower | Higher |
Critical point: Under very high load, a mutex may be better because it “parks” threads rather than burning CPU.
AtomicLong Implementation on 32-bit Systems
// On a 32-bit JVM, writing a long is not atomic!
// long = 64 bits, written in two 32-bit steps
// AtomicLong guarantees atomicity through:
// 1. CMPXCHG8B instruction (x86)
// 2. Or internal locking (on older CPUs)
Bus Traffic and Cache Coherence
On multiprocessor systems, frequent CAS causes:
- Cache Line Invalidation — each CAS sends Invalidate to all cores
- Bus Traffic — intensive inter-core communication
- Memory Bandwidth Saturation — the bus is clogged, regular operations slow down
Solution: Segmented structures (like ConcurrentHashMap) or LongAdder.
Diagnostics
JMC (Java Mission Control)
java -XX:StartFlightRecording=filename=rec.jfr MyApp
Events:
jdk.ExecutionSample— shows where threads spend time- If lots of time in
compareAndSet→ contention
-XX:+PrintAssembly
java -XX:+PrintAssembly -XX:+UnlockDiagnosticVMOptions
Look for:
lock cmpxchg dword ptr [rax], rcx # ← This is CAS
Measuring Contention
// Programmatic contention estimate
AtomicLong counter = new AtomicLong(0);
long start = System.nanoTime();
for (int i = 0; i < 1_000_000; i++) {
counter.incrementAndGet();
}
long elapsed = System.nanoTime() - start;
System.out.println("Average time: " + (elapsed / 1_000_000) + " ns");
// Much > 10 ns → contention
Best Practices
- CAS for simple operations — increment, set, get
- Back-off under contention —
Thread.onSpinWait()as a CPU hint - LongAdder for counters — under high contention
- AtomicStampedReference — when the ABA problem is possible
- Don’t use CAS for long operations — switch to locks
- Monitor CAS failures — via JFR or profiling
- Avoid CAS for multiple fields — use synchronized
Interview Cheat Sheet
Must know:
- CAS = Compare-And-Swap: atomically compares value with expected and writes new if matched
- On x86:
lock cmpxchg(lockprefix locks memory bus); on ARM:ldxr/stxr; on RISC-V:lr.w/sc.w - CAS provides a full memory barrier (StoreLoad) — all writes before CAS completed, all reads after CAS are current
- CAS is an optimistic approach (try/retry), synchronized is pessimistic (lock and do)
- Lock-free (AtomicInteger) vs Wait-free (LongAdder): lock-free = at least one thread completes; wait-free = every thread completes
- ABA problem: value returned A→B→A, CAS won’t notice; solution — AtomicStampedReference (value + version)
- Under high contention, CAS cycle wastes CPU (100%); solution — exponential back-off or switch to LongAdder/Lock
Frequent follow-up questions:
- When is CAS faster than synchronized? — Under low/medium contention (no locking/parking overhead)
- When is synchronized faster than CAS? — Under high contention (100+ threads): CAS spins and burns CPU, mutex parks threads
- What does
Thread.onSpinWait()do? — Hint to the JVM/CPU “I’m in a spin cycle”; on x86 = PAUSE instruction, reduces power consumption - How does CAS differ from volatile? — volatile = visibility only; CAS = visibility + atomic write (compare-and-swap)
Red flags (do NOT say):
- “CAS is always faster than synchronized” — no, under high contention a mutex is more effective
- “CAS has no memory issues” — no, CAS sets a full memory barrier, which is expensive
- “ABA problem is a Java bug” — no, it’s a fundamental problem of CAS algorithms in any language
- “Wait-free and lock-free are the same” — no, wait-free is stronger: EVERY thread completes, lock-free — at least ONE
Related topics:
- [[8. What are Atomic classes]] — all Atomic classes are built on CAS
- [[1. What is the difference between synchronized and volatile]] — CAS vs synchronized approaches
- [[3. What is visibility problem]] — CAS provides a full memory barrier
- [[7. What is reentrant lock]] — AQS uses CAS to acquire state