What is stop-the-world?

Junior Level

Stop-The-World (STW) — when the JVM stops all application threads to perform GC.

Simple analogy: Imagine you’re working in an office, and suddenly an evacuation is announced. Everyone stops and waits until the check is complete.

Why STW is needed:

GC — needs consistent memory state (objects shouldn’t move during traversal)
Non-GC operations: thread dumps, biased lock revocation, class redefinition (JVM TI)

How long:

ZGC: < 1 ms
G1: 20-200 ms
Full GC: seconds

Middle Level

Safepoints

Threads cannot stop instantly!
→ Must reach a Safepoint (stop point)

Where they're placed:
  - Before method exit
  - At loop ends
  - Before method calls

Mechanism: Safepoint Poll
  → JVM marks page as inaccessible
  → Thread reads → SIGSEGV → goes to waiting

TTSP (Time To Safepoint)

Time from "stop" command to the last stopped thread

Problems:
  - Counted Loops (int loops) → may have no Safepoints
  - Heavy I/O → thread in system call
  - Swapping → Safepoint page in swap

Solution: -XX:+UseCountedLoopSafepoints

Thread-Local Handshakes (Java 11+)

Before: stop ALL threads for operation on one
Now: Handshake with specific thread
  → Others keep working!
  → Examples: Thread Dump, Biased Lock revocation

Diagnostics

# Safepoint logs
-Xlog:safepoint
-Xlog:safepoint+stats

# Look for:
# "reaching safepoint" → wait time
# If > 100 ms → problem!

Senior Level

Safepoint Polling Mechanics

Safepoint Page:
  Normal: page accessible → instant read
  STW: Guard Page → SIGSEGV → thread waiting

JIT inserts check at every Safepoint:
  movl r0, [safepoint_page]  // Fast Path
  → If page inaccessible → trap

JVM uses polling page (memory page) instead of flag check because reading from cache (page almost always in L1) is faster than branching. This is “fast path” — near-zero overhead.

Counted Loops Problem

// ❌ JIT doesn't insert Safepoints in counted loops
for (int i = 0; i < 1_000_000_000; i++) {
    // Billion iterations without check!
    // GC waits → TTSP = seconds
}

// Solution:
-XX:+UseCountedLoopSafepoints
// → JIT inserts checks every N iterations
// → overhead ~1-2%

JNI and STW

JNI threads: JVM cannot inspect their stack during STW.
JVM waits for JNI thread to return to Java code (reach safepoint).
JNI thread is NOT "working during STW" — it's blocked when trying to return to Java.

JFR Safepoint Events

Java Flight Recorder:
  → Event 'Safepoint Begin'
  → Time: entering + waiting + cleanup
  → Visualization in JMC

Finding problems:
  → If entering > 50 ms → TTSP problem
  → If waiting > entering → one thread is slowing down

Best Practices

Monitor TTSP via safepoint logs
Counted Loops → UseCountedLoopSafepoints
Handshakes (Java 11+) → fewer global STW
JFR for visualization
Avoid long loops without Safepoints
JNI → monitor time in native

Senior Summary

STW = not only GC, but other JVM operations too
Safepoints = thread coordination mechanism
TTSP = time to enter STW (can be longer than GC itself!)
Counted Loops = common cause of long TTSP
Handshakes (Java 11+) = reduced global pauses
JFR = best analysis tool

Interview Cheat Sheet

Must know:

STW — JVM stops ALL application threads; not only for GC, but also for thread dumps, biased lock revocation, class redefinition
Safepoints — points in code (method exit, loop end) where thread can stop; mechanism: Safepoint Poll (Guard Page → SIGSEGV)
TTSP (Time To Safepoint) — time from “stop” command to last stopped thread; can be longer than GC itself!
Counted Loops — JIT doesn’t insert Safepoints in for (int i...) loops → TTSP = seconds; solution: -XX:+UseCountedLoopSafepoints
Thread-Local Handshakes (Java 11+) — operation with specific thread without stopping all others
JNI threads: JVM waits for JNI thread to return to Java code; thread is NOT working during STW, it’s blocked when trying to return
Polling page — reading from cache (page almost always in L1) is faster than branching; near-zero overhead

Common follow-up questions:

Why can TTSP be longer than GC itself? — Counted Loops without Safepoints, heavy I/O (thread in system call), Safepoint page swapping
How to diagnose long TTSP? — -Xlog:safepoint (look for “reaching safepoint” > 100 ms); JFR: Safepoint Begin event → entering/waiting/cleanup
What are Thread-Local Handshakes? — Java 11+: handshake with specific thread instead of stopping all; examples: Thread Dump, Biased Lock revocation
Why is Safepoint Polling faster than flag check? — L1 cache read (page almost always there) is faster than branch; fast path = nearly 0 overhead

Red flags (DO NOT say):

“STW is only for GC” — STW is also for thread dumps, biased lock revocation, class redefinition
“JNI threads work during STW” — JNI thread is blocked when trying to return to Java; JVM waits for it
“Counted Loops don’t affect performance” — without Safepoints, TTSP can be seconds

Related topics:

[[4. What is Garbage Collection]]
[[17. Which GCs minimize stop-the-world pauses]]
[[14. What is ZGC]]
[[13. What is G1 GC]]
[[12. What GC algorithms exist]]