Question 23 · Section 9

What are the advantages of Virtual Threads over regular threads?

This file explains why virtual threads are more efficient than regular ones, not just the fact. Key difference: Platform Threads are a wrapper over an OS thread (expensive, fixe...

Language versions: English Russian Ukrainian

This file explains why virtual threads are more efficient than regular ones, not just the fact. Key difference: Platform Threads are a wrapper over an OS thread (expensive, fixed size), while Virtual Threads are objects in the JVM heap (cheap, dynamic size).


Junior Level

Basic Understanding

Virtual Threads (VT) outperform regular threads (Platform Threads) on several key metrics.

1. Scalability

Characteristic Platform Threads Virtual Threads
Max threads ~2,000-5,000 ~1,000,000+
Limitation OS memory (1MB per thread) JVM Heap
Example 1000 threads = ~1GB 1,000,000 threads = ~256MB

2. Memory Savings

Platform Thread:  ~1MB stack (reserved immediately)
Virtual Thread:   ~several KB (dynamic, in heap)

3. Code Simplicity

Before VT (reactive style):

// Callback Hell — hard to read and debug
httpClient.get(url)
    .thenApply(this::parse)
    .thenCompose(data -> db.save(data))
    .thenAccept(result -> respond(result))
    .exceptionally(ex -> handleError(ex));

With VT (simple blocking style):

// Simple sequential code
String response = httpClient.get(url);  // "Blocks" (unmounts)
Data data = parse(response);
Result result = db.save(data);
respond(result);

Middle Level

Little’s Law

L = λ × W

L = number of active tasks (threads)
λ = throughput (requests per second)
W = processing time per request (latency)

Problem with Platform Threads:

  • L is limited to ~2000-5000 threads
  • If W (latency) grows → λ (throughput) inevitably drops

Solution with Virtual Threads:

  • L can be in the millions
  • Even with slow I/O responses, throughput stays high

CPU Efficiency

Platform Thread Context Switch

1. Transition to kernel mode (Kernel Mode) — syscall
2. Save CPU registers to TSS (Task State Segment)
3. Switch page tables (TLB flush)
4. Load new thread
5. Return to user mode

Time: ~1-10 microseconds (~1000-10000 CPU cycles)
Why expensive: each step requires OS and CPU involvement

Virtual Thread Context Switch

1. Save stack to StackChunk (heap) — regular memcpy
2. Switch pointer to another VT in JVM queue
3. Restore stack from heap — regular memcpy

Time: ~10-50 CPU cycles (nanoseconds)
Why cheap: everything in user-space, no system calls

Why VT switching is 100-1000x faster: Platform Thread switch requires entering kernel mode (syscall), which includes TLB flush and page table switching. VT switch is just copying data in the heap, which the CPU does at memory speed.

Architecture Simplification

Approach Complexity Debugging VT Needed?
Thread-per-request Low Easy Yes — VT brings this back
Thread Pool Medium Medium No — pools work without VT
Reactive (WebFlux) High Hard No — VT replace reactivity
CompletableFuture Medium Medium No — but VT simplify it

When VT Do NOT Provide Benefits

Scenario Why Solution
CPU-bound tasks VT don’t unmount, scheduler overhead FixedThreadPool
synchronized-heavy Pinning — blocks Carrier Threads ReentrantLock
Old JDBC drivers Use synchronized internally Update driver

Senior Level

Under the Hood: Minimizing OS Context Switches

Platform Thread Switch

Thread A (User Mode)
    ↓ syscall
Kernel Mode:
    - Save A's registers to TSS
    - Update page tables
    - Load B's registers from TSS
    - Update GDT/LDT
    ↓ iret
Thread B (User Mode)

~1000+ CPU cycles

Virtual Thread Switch

VT A on Carrier Thread 1
    ↓ yield (I/O blocking)
    - Save stack to StackChunk (heap)
    - Free Carrier Thread 1
    ↓
Carrier Thread 1 takes VT B from queue
    - Restore VT B's stack from StackChunk
    ↓
VT B continues execution

~10-50 CPU cycles

Memory: Heap vs Native Stack

Platform Threads:
┌─────────────────────┐
│  JVM Heap           │
│  (objects, etc.)    │
├─────────────────────┤
│  Native Memory      │
│  Thread Stack: 1MB  │ ← Outside heap (RSS)
│  Thread Stack: 1MB  │
│  Thread Stack: 1MB  │
│  ...                │
└─────────────────────┘

Virtual Threads:
┌─────────────────────┐
│  JVM Heap           │
│  StackChunk #1: 4KB │
│  StackChunk #2: 8KB │
│  StackChunk #3: 2KB │
│  ... (dynamic)      │
├─────────────────────┤
│  Native Memory      │
│  8 Carrier Threads  │ ← Only 8 stacks of 1MB
└─────────────────────┘

Latency and P99

Platform Threads under load:
  P50: 10ms
  P99: 500ms  ← Queue for acquiring system threads

Virtual Threads:
  P50: 10ms
  P99: 15ms   ← Stable, no queue for threads

Pinning — the Main Problem

// Virtual thread "sticks" to Carrier Thread on:

// 1. synchronized block/method
synchronized(lock) {
    blockingIO(); // Carrier Thread blocked for the duration of I/O!
}

// 2. Native methods (JNI)
nativeMethod(); // Cannot unmount

Diagnosis:

java -Djdk.tracePinnedThreads=full MyApp

Solution:

// Replace synchronized with ReentrantLock
ReentrantLock lock = new ReentrantLock();
lock.lock();
try {
    blockingIO(); // Unmounts — Carrier Thread is free
} finally {
    lock.unlock();
}

ThreadLocal → Scoped Values

// ThreadLocal — problem with VT
ThreadLocal<User> user = new ThreadLocal<>();
// Million VT = million copies of User = Memory Problem!

// Scoped Values (Java 21 Preview) — solution
static final ScopedValue<User> CURRENT_USER = new ScopedValue<>();

ScopedValue.runWhere(CURRENT_USER, user, () -> {
    process(); // CURRENT_USER.get() is available
});
// Automatically cleaned, no leaks

Diagnostics

Memory Dumps

Virtual threads look like regular objects in the heap:

jmap -dump:file=heap.hprof <pid>
# VT = regular objects — MAT, YourKit work

Latency Monitoring

// After switching to VT — P99 stabilizes
long start = System.nanoTime();
handleRequest();
long elapsed = System.nanoTime() - start;

// VT: P99 ~ stable
// Platform: P99 ~ grows under load

-XX:+PrintAssembly

# Yield points in VT:
call Continuation.yield
# JVM optimizes these transitions

Best Practices

  1. VT for I/O-bound — web servers, APIs, proxies
  2. ReentrantLock instead of synchronized — avoid pinning
  3. Scoped Values instead of ThreadLocal — for context
  4. Semaphore for resource limiting — instead of pool size limits
  5. Not for CPU-bound — FixedThreadPool is better
  6. -Djdk.tracePinnedThreads=full — during development
  7. Monitor P99 — should stabilize
  8. Update JDBC drivers — old ones use synchronized

When NOT to Switch to Virtual Threads

  • Your application already works stably with a thread pool — don’t fix what isn’t broken. VT provide benefits for I/O-bound load, not CPU-bound
  • Team isn’t ready to audit dependencies for synchronized — old JDBC, HTTP clients, SimpleDateFormat inside synchronized. Without auditing, VT will “stick” and perform worse
  • Below Java 21 — VT not available. For Java 17 — use reactive approach
  • Low contention (< 100 RPS) — with low load the difference is unnoticeable, but migration risks are real
  • ThreadLocal-heavy application without alternatives — million VT = million ThreadLocal copies. Scoped Values are still in preview

VT vs Platform Threads: When the Difference Is Noticeable

Metric Platform Threads Virtual Threads When the difference is visible
Throughput (I/O) 100-500 RPS 10,000-100,000 RPS Under high I/O load
P99 latency Grows with load Stable Under overload (queue for threads)
Memory for 10K tasks ~10 GB ~50-100 MB When scaling
CPU-bound tasks Faster Slower (overhead) Never use VT for computation

Interview Cheat Sheet

Must know:

  • VT context switch = ~10-50 CPU cycles (user-space, memcpy), Platform Thread = ~1000-10000 cycles (kernel mode, TLB flush)
  • Little’s Law: L = λ × W — VT increase L (number of parallel tasks) without dropping throughput
  • Memory: Platform Thread ~1MB native stack, VT ~several KB dynamically in heap (StackChunk)
  • VT bring back the Thread-per-Request model — simple blocking code instead of Reactive/Callback Hell
  • P99 latency: Platform Threads grow with load, VT stay stable
  • Pinning — the main problem: synchronized in VT blocks the Carrier Thread

Frequent follow-up questions:

  • Why is Platform Thread context switch so expensive? — Requires entering kernel mode (syscall), TLB flush, page table switching
  • What is VT’s effect on P99? — Platform Threads under overload: P99 grows due to thread queue; VT: P99 stable, as there’s no queue for thread creation
  • Why don’t VT provide gains at low contention? — At < 100 RPS the difference is unnoticeable, but migration risks (auditing synchronized, updating drivers) are real
  • Can you use VT and Platform Threads together? — Yes: VT for I/O-bound, Platform Threads (FixedThreadPool) for CPU-bound

Red flags (DO NOT say):

  • “VT use less memory because they are simpler” — VT are cheaper because the stack is in the heap (dynamic), not because they are “simpler”
  • “VT replace ExecutorService” — VT are a different type of executor, ExecutorService is still used (Executors.newVirtualThreadPerTaskExecutor())
  • “Reactive programming is no longer needed” — VT replace reactivity for I/O-bound, but the reactive approach is still relevant for streaming, backpressure
  • “VT automatically solve all performance problems” — VT only help with I/O-bound; for CPU-bound or pinning they can make things worse

Related topics:

  • [[23. What are Virtual Threads in Java 21]]
  • [[25. When should you use Virtual Threads]]
  • [[26. What is structured concurrency]]
  • [[27. What is the difference between Thread and Runnable]]