What are parallel streams?

Junior Level

Parallel streams are a way to process data in multiple threads simultaneously, using ForkJoinPool.commonPool(), which by default uses cores - 1 (one core is reserved for system tasks).

Created in two ways:

// From a collection
list.parallelStream().forEach(System.out::println);

// From a regular stream
list.stream().parallel().forEach(System.out::println);

By default uses all available CPU cores. For a list of 1000 elements, processing can theoretically be faster proportionally to the number of workers (cores - 1), but in practice the overhead of fork/join and merge reduces speedup to 2-4x.

Important: Element order is not guaranteed in forEach.

When NOT to use parallel streams

I/O operations — block ForkJoinPool workers, other tasks wait
A few thousand elements — overhead > benefit
Stateful operations with ThreadLocal — workers are reused, data “leaks”
When order matters — parallelStream does not guarantee order (except for ordered sources)

Middle Level

Mechanism: ForkJoin and Spliterator

Parallel streams use the ForkJoin framework:

Data is split into parts via Spliterator.trySplit()
Each part is processed by a separate thread
Results are combined (combiner)

Efficiency depends on the source:

ArrayList, arrays — split ideally by index
HashSet, TreeSet — split decently
LinkedList — terrible (need to traverse half the list)
Stream.iterate — impossible to parallelize

ForkJoinPool.commonPool()

By default all parallel streams use one shared pool:

Size = number_of_cores - 1
Risk: If one stream performs blocking I/O, it occupies threads of the shared pool — all other parallel streams wait

Execution order

// Random order
parallelStream().forEach(System.out::println);

// Guaranteed order (slower)
parallelStream().forEachOrdered(System.out::println);

Senior Level

N*Q Model

Empirical rule: parallelism pays off when N * Q > 10,000:

N — number of elements
Q — cost of computation per element
10,000 — approximate threshold where fork/join overhead pays off

When parallelism HURTS:

Small collections (overhead on split/merge)
Cheap operations (faster than context switching)
Locks (synchronization kills parallelism)
I/O operations (block commonPool)

Stateful Operations in parallelism

sorted(), distinct(), limit() in a parallel stream require full synchronization (“barrier”), which often makes them slower than sequential mode.

ThreadLocal Danger

In the general case, do not rely on ThreadLocal — ForkJoinPool workers are reused between tasks. If you control a custom ForkJoinPool and clean ThreadLocal in finally — acceptable.

Custom ForkJoinPool

For load isolation, use a custom pool:

ForkJoinPool customPool = new ForkJoinPool(4);
long result = customPool.submit(() ->
    list.parallelStream().mapToInt(this::doWork).sum()
).get();

The stream uses the current thread’s pool, not commonPool.

Diagnostics

Thread Names: Inside a lambda, print Thread.currentThread().getName() — you will see ForkJoinPool.commonPool-worker-N
JFR (Java Flight Recorder): Shows thread activity in ForkJoinPool
-Djava.util.concurrent.ForkJoinPool.common.parallelism=N: System flag to adjust pool size

Interview Cheat Sheet

Must know:

Parallel streams use ForkJoinPool.commonPool() (size = cores - 1)
Two creation methods: collection.parallelStream() and stream().parallel()
Mechanism: Spliterator.trySplit() splits data, each worker processes its part
Splitting efficiency: ArrayList/arrays > HashSet/TreeSet > LinkedList > Stream.iterate
Empirical rule: N * Q > 10,000 — parallelism pays off
Order is not guaranteed in forEach, but guaranteed in forEachOrdered

Common follow-up questions:

Why are I/O operations dangerous in parallelStream? — They block commonPool workers, all streams in the application stall
When does parallelism HURT? — Small collections, cheap operations, locks, I/O, stateful operations
How to isolate load? — Use a custom ForkJoinPool: customPool.submit(() -> list.parallelStream()...)
Why is LinkedList terrible for parallelism? — Need to traverse half the list to split

Red flags (DO NOT say):

“parallelStream is always faster” — no, fork/join/merge overhead can slow it down
“parallelStream creates new threads” — no, it uses the shared ForkJoinPool.commonPool()
“forEach in parallelStream guarantees order” — no, only forEachOrdered
“ThreadLocal is safe in ForkJoinPool” — no, workers are reused between tasks

Related topics:

[[10. When to use parallel streams]]
[[1. What advantages does Stream API provide]]
[[2. What is the difference between intermediate and terminal operations]]
[[6. What is Collector and what built-in Collectors exist]]