What are the advantages of Virtual Threads over regular threads?
This file explains why virtual threads are more efficient than regular ones, not just the fact. Key difference: Platform Threads are a wrapper over an OS thread (expensive, fixe...
This file explains why virtual threads are more efficient than regular ones, not just the fact. Key difference: Platform Threads are a wrapper over an OS thread (expensive, fixed size), while Virtual Threads are objects in the JVM heap (cheap, dynamic size).
Junior Level
Basic Understanding
Virtual Threads (VT) outperform regular threads (Platform Threads) on several key metrics.
1. Scalability
| Characteristic | Platform Threads | Virtual Threads |
|---|---|---|
| Max threads | ~2,000-5,000 | ~1,000,000+ |
| Limitation | OS memory (1MB per thread) | JVM Heap |
| Example | 1000 threads = ~1GB | 1,000,000 threads = ~256MB |
2. Memory Savings
Platform Thread: ~1MB stack (reserved immediately)
Virtual Thread: ~several KB (dynamic, in heap)
3. Code Simplicity
Before VT (reactive style):
// Callback Hell — hard to read and debug
httpClient.get(url)
.thenApply(this::parse)
.thenCompose(data -> db.save(data))
.thenAccept(result -> respond(result))
.exceptionally(ex -> handleError(ex));
With VT (simple blocking style):
// Simple sequential code
String response = httpClient.get(url); // "Blocks" (unmounts)
Data data = parse(response);
Result result = db.save(data);
respond(result);
Middle Level
Little’s Law
L = λ × W
L = number of active tasks (threads)
λ = throughput (requests per second)
W = processing time per request (latency)
Problem with Platform Threads:
- L is limited to ~2000-5000 threads
- If W (latency) grows → λ (throughput) inevitably drops
Solution with Virtual Threads:
- L can be in the millions
- Even with slow I/O responses, throughput stays high
CPU Efficiency
Platform Thread Context Switch
1. Transition to kernel mode (Kernel Mode) — syscall
2. Save CPU registers to TSS (Task State Segment)
3. Switch page tables (TLB flush)
4. Load new thread
5. Return to user mode
Time: ~1-10 microseconds (~1000-10000 CPU cycles)
Why expensive: each step requires OS and CPU involvement
Virtual Thread Context Switch
1. Save stack to StackChunk (heap) — regular memcpy
2. Switch pointer to another VT in JVM queue
3. Restore stack from heap — regular memcpy
Time: ~10-50 CPU cycles (nanoseconds)
Why cheap: everything in user-space, no system calls
Why VT switching is 100-1000x faster: Platform Thread switch requires entering kernel mode (syscall), which includes TLB flush and page table switching. VT switch is just copying data in the heap, which the CPU does at memory speed.
Architecture Simplification
| Approach | Complexity | Debugging | VT Needed? |
|---|---|---|---|
| Thread-per-request | Low | Easy | Yes — VT brings this back |
| Thread Pool | Medium | Medium | No — pools work without VT |
| Reactive (WebFlux) | High | Hard | No — VT replace reactivity |
| CompletableFuture | Medium | Medium | No — but VT simplify it |
When VT Do NOT Provide Benefits
| Scenario | Why | Solution |
|---|---|---|
| CPU-bound tasks | VT don’t unmount, scheduler overhead | FixedThreadPool |
| synchronized-heavy | Pinning — blocks Carrier Threads | ReentrantLock |
| Old JDBC drivers | Use synchronized internally | Update driver |
Senior Level
Under the Hood: Minimizing OS Context Switches
Platform Thread Switch
Thread A (User Mode)
↓ syscall
Kernel Mode:
- Save A's registers to TSS
- Update page tables
- Load B's registers from TSS
- Update GDT/LDT
↓ iret
Thread B (User Mode)
~1000+ CPU cycles
Virtual Thread Switch
VT A on Carrier Thread 1
↓ yield (I/O blocking)
- Save stack to StackChunk (heap)
- Free Carrier Thread 1
↓
Carrier Thread 1 takes VT B from queue
- Restore VT B's stack from StackChunk
↓
VT B continues execution
~10-50 CPU cycles
Memory: Heap vs Native Stack
Platform Threads:
┌─────────────────────┐
│ JVM Heap │
│ (objects, etc.) │
├─────────────────────┤
│ Native Memory │
│ Thread Stack: 1MB │ ← Outside heap (RSS)
│ Thread Stack: 1MB │
│ Thread Stack: 1MB │
│ ... │
└─────────────────────┘
Virtual Threads:
┌─────────────────────┐
│ JVM Heap │
│ StackChunk #1: 4KB │
│ StackChunk #2: 8KB │
│ StackChunk #3: 2KB │
│ ... (dynamic) │
├─────────────────────┤
│ Native Memory │
│ 8 Carrier Threads │ ← Only 8 stacks of 1MB
└─────────────────────┘
Latency and P99
Platform Threads under load:
P50: 10ms
P99: 500ms ← Queue for acquiring system threads
Virtual Threads:
P50: 10ms
P99: 15ms ← Stable, no queue for threads
Pinning — the Main Problem
// Virtual thread "sticks" to Carrier Thread on:
// 1. synchronized block/method
synchronized(lock) {
blockingIO(); // Carrier Thread blocked for the duration of I/O!
}
// 2. Native methods (JNI)
nativeMethod(); // Cannot unmount
Diagnosis:
java -Djdk.tracePinnedThreads=full MyApp
Solution:
// Replace synchronized with ReentrantLock
ReentrantLock lock = new ReentrantLock();
lock.lock();
try {
blockingIO(); // Unmounts — Carrier Thread is free
} finally {
lock.unlock();
}
ThreadLocal → Scoped Values
// ThreadLocal — problem with VT
ThreadLocal<User> user = new ThreadLocal<>();
// Million VT = million copies of User = Memory Problem!
// Scoped Values (Java 21 Preview) — solution
static final ScopedValue<User> CURRENT_USER = new ScopedValue<>();
ScopedValue.runWhere(CURRENT_USER, user, () -> {
process(); // CURRENT_USER.get() is available
});
// Automatically cleaned, no leaks
Diagnostics
Memory Dumps
Virtual threads look like regular objects in the heap:
jmap -dump:file=heap.hprof <pid>
# VT = regular objects — MAT, YourKit work
Latency Monitoring
// After switching to VT — P99 stabilizes
long start = System.nanoTime();
handleRequest();
long elapsed = System.nanoTime() - start;
// VT: P99 ~ stable
// Platform: P99 ~ grows under load
-XX:+PrintAssembly
# Yield points in VT:
call Continuation.yield
# JVM optimizes these transitions
Best Practices
- VT for I/O-bound — web servers, APIs, proxies
- ReentrantLock instead of synchronized — avoid pinning
- Scoped Values instead of ThreadLocal — for context
- Semaphore for resource limiting — instead of pool size limits
- Not for CPU-bound — FixedThreadPool is better
- -Djdk.tracePinnedThreads=full — during development
- Monitor P99 — should stabilize
- Update JDBC drivers — old ones use synchronized
When NOT to Switch to Virtual Threads
- Your application already works stably with a thread pool — don’t fix what isn’t broken. VT provide benefits for I/O-bound load, not CPU-bound
- Team isn’t ready to audit dependencies for synchronized — old JDBC, HTTP clients, SimpleDateFormat inside synchronized. Without auditing, VT will “stick” and perform worse
- Below Java 21 — VT not available. For Java 17 — use reactive approach
- Low contention (< 100 RPS) — with low load the difference is unnoticeable, but migration risks are real
- ThreadLocal-heavy application without alternatives — million VT = million ThreadLocal copies. Scoped Values are still in preview
VT vs Platform Threads: When the Difference Is Noticeable
| Metric | Platform Threads | Virtual Threads | When the difference is visible |
|---|---|---|---|
| Throughput (I/O) | 100-500 RPS | 10,000-100,000 RPS | Under high I/O load |
| P99 latency | Grows with load | Stable | Under overload (queue for threads) |
| Memory for 10K tasks | ~10 GB | ~50-100 MB | When scaling |
| CPU-bound tasks | Faster | Slower (overhead) | Never use VT for computation |
Interview Cheat Sheet
Must know:
- VT context switch = ~10-50 CPU cycles (user-space, memcpy), Platform Thread = ~1000-10000 cycles (kernel mode, TLB flush)
- Little’s Law: L = λ × W — VT increase L (number of parallel tasks) without dropping throughput
- Memory: Platform Thread ~1MB native stack, VT ~several KB dynamically in heap (StackChunk)
- VT bring back the Thread-per-Request model — simple blocking code instead of Reactive/Callback Hell
- P99 latency: Platform Threads grow with load, VT stay stable
- Pinning — the main problem: synchronized in VT blocks the Carrier Thread
Frequent follow-up questions:
- Why is Platform Thread context switch so expensive? — Requires entering kernel mode (syscall), TLB flush, page table switching
- What is VT’s effect on P99? — Platform Threads under overload: P99 grows due to thread queue; VT: P99 stable, as there’s no queue for thread creation
- Why don’t VT provide gains at low contention? — At < 100 RPS the difference is unnoticeable, but migration risks (auditing synchronized, updating drivers) are real
- Can you use VT and Platform Threads together? — Yes: VT for I/O-bound, Platform Threads (FixedThreadPool) for CPU-bound
Red flags (DO NOT say):
- “VT use less memory because they are simpler” — VT are cheaper because the stack is in the heap (dynamic), not because they are “simpler”
- “VT replace ExecutorService” — VT are a different type of executor, ExecutorService is still used (
Executors.newVirtualThreadPerTaskExecutor()) - “Reactive programming is no longer needed” — VT replace reactivity for I/O-bound, but the reactive approach is still relevant for streaming, backpressure
- “VT automatically solve all performance problems” — VT only help with I/O-bound; for CPU-bound or pinning they can make things worse
Related topics:
- [[23. What are Virtual Threads in Java 21]]
- [[25. When should you use Virtual Threads]]
- [[26. What is structured concurrency]]
- [[27. What is the difference between Thread and Runnable]]