What are the advantages of Virtual Threads over regular threads?

This file explains why virtual threads are more efficient than regular ones, not just the fact. Key difference: Platform Threads are a wrapper over an OS thread (expensive, fixed size), while Virtual Threads are objects in the JVM heap (cheap, dynamic size).

Junior Level

Basic Understanding

Virtual Threads (VT) outperform regular threads (Platform Threads) on several key metrics.

1. Scalability

Characteristic	Platform Threads	Virtual Threads
Max threads	~2,000-5,000	~1,000,000+
Limitation	OS memory (1MB per thread)	JVM Heap
Example	1000 threads = ~1GB	1,000,000 threads = ~256MB

2. Memory Savings

Platform Thread:  ~1MB stack (reserved immediately)
Virtual Thread:   ~several KB (dynamic, in heap)

3. Code Simplicity

Before VT (reactive style):

// Callback Hell — hard to read and debug
httpClient.get(url)
    .thenApply(this::parse)
    .thenCompose(data -> db.save(data))
    .thenAccept(result -> respond(result))
    .exceptionally(ex -> handleError(ex));

With VT (simple blocking style):

// Simple sequential code
String response = httpClient.get(url);  // "Blocks" (unmounts)
Data data = parse(response);
Result result = db.save(data);
respond(result);

Middle Level

Little’s Law

L = λ × W

L = number of active tasks (threads)
λ = throughput (requests per second)
W = processing time per request (latency)

Problem with Platform Threads:

L is limited to ~2000-5000 threads
If W (latency) grows → λ (throughput) inevitably drops

Solution with Virtual Threads:

L can be in the millions
Even with slow I/O responses, throughput stays high

CPU Efficiency

Platform Thread Context Switch

Transition to kernel mode (Kernel Mode) — syscall
Save CPU registers to TSS (Task State Segment)
Switch page tables (TLB flush)
Load new thread
Return to user mode

Time: ~1-10 microseconds (~1000-10000 CPU cycles)
Why expensive: each step requires OS and CPU involvement

Virtual Thread Context Switch

1. Save stack to StackChunk (heap) — regular memcpy
2. Switch pointer to another VT in JVM queue
3. Restore stack from heap — regular memcpy

Time: ~10-50 CPU cycles (nanoseconds)
Why cheap: everything in user-space, no system calls

Why VT switching is 100-1000x faster: Platform Thread switch requires entering kernel mode (syscall), which includes TLB flush and page table switching. VT switch is just copying data in the heap, which the CPU does at memory speed.

Architecture Simplification

Approach	Complexity	Debugging	VT Needed?
Thread-per-request	Low	Easy	Yes — VT brings this back
Thread Pool	Medium	Medium	No — pools work without VT
Reactive (WebFlux)	High	Hard	No — VT replace reactivity
CompletableFuture	Medium	Medium	No — but VT simplify it

When VT Do NOT Provide Benefits

Scenario	Why	Solution
CPU-bound tasks	VT don’t unmount, scheduler overhead	FixedThreadPool
synchronized-heavy	Pinning — blocks Carrier Threads	ReentrantLock
Old JDBC drivers	Use synchronized internally	Update driver

Senior Level

Under the Hood: Minimizing OS Context Switches

Platform Thread Switch

Thread A (User Mode)
    ↓ syscall
Kernel Mode:
    - Save A's registers to TSS
    - Update page tables
    - Load B's registers from TSS
    - Update GDT/LDT
    ↓ iret
Thread B (User Mode)

~1000+ CPU cycles

Virtual Thread Switch

VT A on Carrier Thread 1
    ↓ yield (I/O blocking)
    - Save stack to StackChunk (heap)
    - Free Carrier Thread 1
    ↓
Carrier Thread 1 takes VT B from queue
    - Restore VT B's stack from StackChunk
    ↓
VT B continues execution

~10-50 CPU cycles

Memory: Heap vs Native Stack

Platform Threads:
┌─────────────────────┐
│  JVM Heap           │
│  (objects, etc.)    │
├─────────────────────┤
│  Native Memory      │
│  Thread Stack: 1MB  │ ← Outside heap (RSS)
│  Thread Stack: 1MB  │
│  Thread Stack: 1MB  │
│  ...                │
└─────────────────────┘

Virtual Threads:
┌─────────────────────┐
│  JVM Heap           │
│  StackChunk #1: 4KB │
│  StackChunk #2: 8KB │
│  StackChunk #3: 2KB │
│  ... (dynamic)      │
├─────────────────────┤
│  Native Memory      │
│  8 Carrier Threads  │ ← Only 8 stacks of 1MB
└─────────────────────┘

Latency and P99

Platform Threads under load:
  P50: 10ms
  P99: 500ms  ← Queue for acquiring system threads

Virtual Threads:
  P50: 10ms
  P99: 15ms   ← Stable, no queue for threads

Pinning — the Main Problem

// Virtual thread "sticks" to Carrier Thread on:

// 1. synchronized block/method
synchronized(lock) {
    blockingIO(); // Carrier Thread blocked for the duration of I/O!
}

// 2. Native methods (JNI)
nativeMethod(); // Cannot unmount

Diagnosis:

java -Djdk.tracePinnedThreads=full MyApp

Solution:

// Replace synchronized with ReentrantLock
ReentrantLock lock = new ReentrantLock();
lock.lock();
try {
    blockingIO(); // Unmounts — Carrier Thread is free
} finally {
    lock.unlock();
}

ThreadLocal → Scoped Values

// ThreadLocal — problem with VT
ThreadLocal<User> user = new ThreadLocal<>();
// Million VT = million copies of User = Memory Problem!

// Scoped Values (Java 21 Preview) — solution
static final ScopedValue<User> CURRENT_USER = new ScopedValue<>();

ScopedValue.runWhere(CURRENT_USER, user, () -> {
    process(); // CURRENT_USER.get() is available
});
// Automatically cleaned, no leaks

Diagnostics

Memory Dumps

Virtual threads look like regular objects in the heap:

jmap -dump:file=heap.hprof <pid>
# VT = regular objects — MAT, YourKit work

Latency Monitoring

// After switching to VT — P99 stabilizes
long start = System.nanoTime();
handleRequest();
long elapsed = System.nanoTime() - start;

// VT: P99 ~ stable
// Platform: P99 ~ grows under load

`-XX:+PrintAssembly`

# Yield points in VT:
call Continuation.yield
# JVM optimizes these transitions

Best Practices

VT for I/O-bound — web servers, APIs, proxies
ReentrantLock instead of synchronized — avoid pinning
Scoped Values instead of ThreadLocal — for context
Semaphore for resource limiting — instead of pool size limits
Not for CPU-bound — FixedThreadPool is better
-Djdk.tracePinnedThreads=full — during development
Monitor P99 — should stabilize
Update JDBC drivers — old ones use synchronized

When NOT to Switch to Virtual Threads

Your application already works stably with a thread pool — don’t fix what isn’t broken. VT provide benefits for I/O-bound load, not CPU-bound
Team isn’t ready to audit dependencies for synchronized — old JDBC, HTTP clients, SimpleDateFormat inside synchronized. Without auditing, VT will “stick” and perform worse
Below Java 21 — VT not available. For Java 17 — use reactive approach
Low contention (< 100 RPS) — with low load the difference is unnoticeable, but migration risks are real
ThreadLocal-heavy application without alternatives — million VT = million ThreadLocal copies. Scoped Values are still in preview

VT vs Platform Threads: When the Difference Is Noticeable

Metric	Platform Threads	Virtual Threads	When the difference is visible
Throughput (I/O)	100-500 RPS	10,000-100,000 RPS	Under high I/O load
P99 latency	Grows with load	Stable	Under overload (queue for threads)
Memory for 10K tasks	~10 GB	~50-100 MB	When scaling
CPU-bound tasks	Faster	Slower (overhead)	Never use VT for computation

Interview Cheat Sheet

Must know:

VT context switch = ~10-50 CPU cycles (user-space, memcpy), Platform Thread = ~1000-10000 cycles (kernel mode, TLB flush)
Little’s Law: L = λ × W — VT increase L (number of parallel tasks) without dropping throughput
Memory: Platform Thread ~1MB native stack, VT ~several KB dynamically in heap (StackChunk)
VT bring back the Thread-per-Request model — simple blocking code instead of Reactive/Callback Hell
P99 latency: Platform Threads grow with load, VT stay stable
Pinning — the main problem: synchronized in VT blocks the Carrier Thread

Frequent follow-up questions:

Why is Platform Thread context switch so expensive? — Requires entering kernel mode (syscall), TLB flush, page table switching
What is VT’s effect on P99? — Platform Threads under overload: P99 grows due to thread queue; VT: P99 stable, as there’s no queue for thread creation
Why don’t VT provide gains at low contention? — At < 100 RPS the difference is unnoticeable, but migration risks (auditing synchronized, updating drivers) are real
Can you use VT and Platform Threads together? — Yes: VT for I/O-bound, Platform Threads (FixedThreadPool) for CPU-bound

Red flags (DO NOT say):

“VT use less memory because they are simpler” — VT are cheaper because the stack is in the heap (dynamic), not because they are “simpler”
“VT replace ExecutorService” — VT are a different type of executor, ExecutorService is still used (Executors.newVirtualThreadPerTaskExecutor())
“Reactive programming is no longer needed” — VT replace reactivity for I/O-bound, but the reactive approach is still relevant for streaming, backpressure
“VT automatically solve all performance problems” — VT only help with I/O-bound; for CPU-bound or pinning they can make things worse

Related topics:

[[23. What are Virtual Threads in Java 21]]
[[25. When should you use Virtual Threads]]
[[26. What is structured concurrency]]
[[27. What is the difference between Thread and Runnable]]