What is ForkJoinPool?
ForkJoinPool is a specialized thread pool optimized for executing tasks based on the "Divide and Conquer" principle.
Junior Level
Basic Understanding
ForkJoinPool is a specialized thread pool optimized for executing tasks based on the “Divide and Conquer” principle.
Why: a regular pool with a single queue is inefficient for recursive tasks — threads sit idle while some tasks are still running. ForkJoinPool solves this with the work-stealing algorithm: a free thread “steals” tasks from the tail of a busy thread’s queue.
Analogy
Imagine you need to count 1,000,000 coins:
- Regular pool: One person takes a task, counts, passes to the next
- ForkJoinPool: You split coins into piles, hand them out. If someone finishes — they help others
Two Key Methods
| Method | Description |
|---|---|
fork() |
Submit a subtask for execution (asynchronously) |
join() |
Wait for the subtask’s result |
Simple Example: Array Sum
// Task with a result
class SumTask extends RecursiveTask<Long> {
private final long[] numbers;
private final int start, end;
public SumTask(long[] numbers, int start, int end) {
this.numbers = numbers;
this.start = start;
this.end = end;
}
@Override
protected Long compute() {
// If the task is small enough — compute directly
if (end - start <= 1000) {
long sum = 0;
for (int i = start; i < end; i++) {
sum += numbers[i];
}
return sum;
}
// Otherwise — split in half
int mid = (start + end) / 2;
SumTask left = new SumTask(numbers, start, mid);
SumTask right = new SumTask(numbers, mid, end);
left.fork(); // Send left part
long rightResult = right.compute(); // Compute right ourselves
long leftResult = left.join(); // Wait for left result
return leftResult + rightResult;
}
}
// Usage
long[] numbers = ...;
ForkJoinPool pool = ForkJoinPool.commonPool();
long total = pool.invoke(new SumTask(numbers, 0, numbers.length));
Where ForkJoinPool Is Used
| Place | Description |
|---|---|
| Parallel Streams | list.parallelStream().map(...) |
| CompletableFuture | Uses commonPool by default |
| Recursive tasks | Tree processing, graph traversal |
Middle Level
Work-Stealing Algorithm
Unlike regular pools (one shared queue), in ForkJoinPool:
Regular pool:
[Shared queue: T1, T2, T3, T4, T5]
Thread 1 → T1
Thread 2 → T2
Thread 3 → (waiting)
ForkJoinPool:
Thread 1: [T1a, T1b, T1c] ← own Deque
Thread 2: [T2a, T2b] ← own Deque
Thread 3: [] ← empty! Steals from Thread 1's tail
Rules
| Rule | Description |
|---|---|
| Own Deque | Each worker has its own double-ended queue |
| LIFO (own) | Worker takes tasks from the HEAD of its own queue (Last-In-First-Out) |
| FIFO (others’) | When stealing, takes from the TAIL of another worker’s queue |
| Locality | LIFO preserves locality of reference — the freshest subtask works with data still in the CPU cache line. Cache access = ~1ns, RAM = ~100ns (100x slower). |
| Minimum contention | Owner takes from the head, thief steals from the tail — they don’t interfere with each other |
RecursiveTask vs RecursiveAction
| Class | Returns Result? | Example |
|---|---|---|
RecursiveTask<V> |
Yes | Sum, finding maximum |
RecursiveAction |
No | Sorting, processing |
// RecursiveTask — with result
class MaxTask extends RecursiveTask<Integer> {
@Override
protected Integer compute() {
// ...
return maxValue;
}
}
// RecursiveAction — no result
class SortTask extends RecursiveAction {
@Override
protected void compute() {
// Sort in place, no result returned
}
}
Common Pool
// Static instance — always available
ForkJoinPool commonPool = ForkJoinPool.commonPool();
// Parallelism = number of CPU cores - 1
int parallelism = commonPool.getParallelism();
Why -1? This leaves one logical thread for other tasks (background operations). On a server without GUI, you can change it via -Djava.util.concurrent.ForkJoinPool.common.parallelism=N.
ForkJoinPool vs ThreadPoolExecutor
| Characteristic | ThreadPoolExecutor | ForkJoinPool |
|---|---|---|
| Queue | One shared (FIFO) | One per thread (LIFO/FIFO) |
| Task type | Independent, blocking | Recursive, computational |
| Efficiency | Average with varied task times | High (work-stealing) |
| Context Switching | Frequent | Minimal |
| Best for | I/O-bound tasks | CPU-bound tasks |
Senior Level
When NOT to Use ForkJoinPool
- Simple independent tasks — regular ThreadPoolExecutor is simpler and more efficient
- I/O-bound tasks — blocking a thread blocks work-stealing, use virtual threads (Java 21+)
- Small data volume — fork/join overhead exceeds the parallelism benefit
- Do NOT use commonPool for I/O — you’ll block all tasks that use it
Under the Hood: Work-Queue Structure
// Simplified structure
class ForkJoinPool {
// Array of queues — one per thread
WorkQueue[] queues;
static final class WorkQueue {
ForkJoinTask<?>[] array; // Circular array for tasks
ForkJoinThread thread; // Worker thread
int base; // Tail (for stealing — FIFO)
int top; // Head (for owner — LIFO)
}
}
Optimized Fork/Join Pattern
@Override
protected Long compute() {
if (size < THRESHOLD) {
return computeSequentially();
}
MyTask left = new MyTask(subList1);
MyTask right = new MyTask(subList2);
// OPTIMIZATION: fork() for one, compute() for the other
left.fork(); // Asynchronously — to the queue
// Key optimization: right.compute() runs in the CURRENT thread
// (doesn't consume a pool thread), while left.fork() goes to the queue.
// Two fork() + two join() would consume 2 threads instead of 1.
long rightResult = right.compute(); // In current thread — saves a thread!
long leftResult = left.join(); // Wait for result
return leftResult + rightResult;
}
Always do fork() for one part and compute() for the other. If you fork() both and then join(), you waste an extra thread.
ManagedBlocker — For Blocking Operations
// ForkJoinPool can create an additional thread if a task blocks
ForkJoinPool.managedBlock(new ForkJoinPool.ManagedBlocker() {
private boolean done = false;
@Override
public boolean block() throws InterruptedException {
// Blocking operation
result = blockingCall();
done = true;
return true;
}
@Override
public boolean isReleasable() {
return done;
}
});
Performance: THRESHOLD
// Too small — many tasks, GC pressure
// Too large — little parallelism
// Heuristic: THRESHOLD should be such that
// sequential processing takes ~100-500 microseconds
private static final int THRESHOLD = 1000;
Problem: Common Pool Starvation
// BAD: blocking operation in commonPool
CompletableFuture.supplyAsync(() -> {
return httpClient.get(url); // Long blocking I/O!
}).thenApply(...);
// Will block one of the commonPool threads → all parallel streams slow down!
// GOOD: separate pool for I/O
ExecutorService ioPool = Executors.newFixedThreadPool(20);
CompletableFuture.supplyAsync(() -> httpClient.get(url), ioPool);
Diagnostics
ForkJoinPool Metrics
ForkJoinPool pool = ForkJoinPool.commonPool();
pool.getStealCount(); // How many tasks were "stolen" — high = uneven tasks
pool.getQueuedTaskCount(); // Tasks in queues
pool.getActiveThreadCount();// Active threads
pool.getParallelism(); // Target parallelism
jstack
"ForkJoinPool.commonPool-worker-1" #10 daemon prio=5
"ForkJoinPool.commonPool-worker-2" #11 daemon prio=5
// ForkJoinPool threads have these names
Best Practices
- Use ForkJoinPool for CPU-bound tasks (computations)
- Do not use commonPool for I/O — blocking paralyzes all parallel streams
- Optimal THRESHOLD — sequential processing should take ~100-500 microseconds
- fork() + compute() — always optimize, don’t do fork()+fork()+join()+join()
- Monitor stealCount — high = tasks are distributed unevenly
- ManagedBlocker — for blocking operations inside ForkJoinPool
- Separate pool for I/O — don’t use commonPool
Interview Cheat Sheet
Must know:
- ForkJoinPool — pool for recursive tasks using “divide and conquer”, uses work-stealing algorithm
- Regular pool = one shared queue (FIFO), ForkJoinPool = each thread has its own deque (owner: LIFO from head, thief: FIFO from tail)
- LIFO for the owner preserves locality of reference — the freshest subtask works with data from the CPU cache line
- RecursiveTask
returns a result, RecursiveAction — does not - commonPool size = number of CPU cores - 1 (leaves 1 thread for background tasks)
- Optimization: fork() for one part, compute() for the other — saves a pool thread
- parallelStream() and CompletableFuture use commonPool by default
- NEVER use commonPool for I/O — you’ll block all parallel streams in the application
Frequent follow-up questions:
- Why does the thief steal from the tail (FIFO) and the owner takes from the head (LIFO)? — To minimize contention: owner and thief work from opposite ends of the deque
- Why is fork()+compute() better than fork()+fork()+join()+join()? — fork()+compute() executes one part in the current thread, saving a pool thread
- Why is commonPool bad for I/O? — Blocking one work-stealing thread paralyzes all tasks using commonPool (including parallel streams)
- What is ManagedBlocker? — A mechanism that allows ForkJoinPool to create an additional thread if a task blocks
Red flags (DO NOT say):
- “ForkJoinPool is good for I/O-bound tasks” — no, blocking paralyzes work-stealing
- “I use commonPool for HTTP requests” — you’ll block all parallel streams in the JVM
- “ForkJoinPool = regular pool with several queues” — no, the key difference is the work-stealing algorithm
- “I fork() both subtasks” — anti-pattern, fork()+compute() saves a thread
Related topics:
- [[12. What is a Thread Pool]]
- [[13. What types of Thread Pool exist in Java]]
- [[15. What does ExecutorService do]]
- [[16. What is the difference between Executors.newFixedThreadPool() and newCachedThreadPool()]]