Question 10 · Section 8

When to use parallel streams?

Use parallel streams when you need to process a lot of data and the task is CPU-intensive.

Language versions: English Russian Ukrainian

Junior Level

Use parallel streams when you need to process a lot of data and the task is CPU-intensive.

When YES:

  • Processing large collections (thousands of elements)
  • Complex computations (math, hashing)
  • Each element is processed independently

When NO:

  • Small lists (less than ~100 elements) — overhead of creating ForkJoinTask (~50us) exceeds sequential processing time (~1us).
  • Simple operations (x + 1)
  • Database queries or HTTP
// GOOD — CPU-intensive task
bigList.parallelStream()
    .map(this::heavyComputation)
    .collect(toList());

// BAD — I/O operation
bigList.parallelStream()
    .map(this::saveToDatabase)  // blocks threads!
    .collect(toList());

Middle Level

N*Q Model

Formula from Oracle experts:

  • N — number of elements
  • Q — amount of work (CPU cycles) per element
  • If N * Q > 10,000 — parallelism will give a gain

Examples:

  • Summing 100 numbers (Q is small) — parallel is slower
  • Hashing 100 large documents (Q is large) — parallel is faster

Source characteristics

Source Splittability Why
ArrayList, Arrays Excellent Split by index in O(1)
IntStream.range Excellent Start and end known
HashSet, TreeSet Average More complex structure
LinkedList Poor Need to traverse half the list
Stream.iterate Worst Element N depends on N-1

When you SHOULD use

  1. CPU-intensive tasks: math, cryptography, image processing
  2. Independent operations: elements do not affect each other
  3. Simple reduction: sum, min, max — associative operations

When you should NOT

  1. I/O operations: DB queries, HTTP — block commonPool
  2. Stateful operations: limit(), sorted(), distinct() require coordination
  3. Small data: overhead on split/merge exceeds computation
  4. Side Effects: modifying external variables requires synchronization

ParallelStream vs alternatives

  • ExecutorService.invokeAll() — more control, but more boilerplate
  • CompletableFuture.allOf() — better for I/O-bound tasks with non-blocking wait
  • Parallel Arrays (libraries like fastutil) — optimized for primitives
  • parallelStream — best choice for CPU-bound operations on collections

Senior Level

False Sharing

When processing arrays of primitives in parallel, threads can conflict over processor cache lines (L1/L2 cache) if they update data that is too close together.

GC Pressure

Parallel streams create many small tasks (RecursiveTask), which increases Minor GC frequency in high-load systems.

Common Pool Poisoning

In Java 21, the behavior of ForkJoinPool.commonPool() has changed. Always test parallelStream on your JVM version.

One stream with blocking operations can occupy all threads of commonPool — all other parallel streams in the application will stall.

Diagnostics

JMH (Java Microbenchmark Harness): Never introduce parallelStream without benchmarking via JMH. Intuition fails on multithreading questions.

Pool configuration: -Djava.util.concurrent.ForkJoinPool.common.parallelism=N — affects the ENTIRE application.

Verification: Always compare performance of stream() vs parallelStream() on real data.


Interview Cheat Sheet

Must know:

  • Rule N * Q > 10,000: N — number of elements, Q — cost of computation per element
  • YES: CPU-intensive tasks (math, hashing, image processing), independent elements, simple reductions
  • NO: I/O operations, small data (< ~100 elements), stateful operations (limit, sorted, distinct), side effects
  • Excellent splittability: ArrayList, arrays, IntStream.range. Poor: LinkedList, Stream.iterate
  • parallelStream vs alternatives: CompletableFuture for I/O-bound, ExecutorService for control
  • Always benchmark with JMH — intuition fails on multithreading

Common follow-up questions:

  • Why don’t small collections work? — ForkJoinTask overhead (~50us) > sequential processing (~1us)
  • What is False Sharing? — Threads conflict over CPU cache lines when data is located too close together
  • Common Pool Poisoning — what is it? — One stream with blocking I/O occupies all threads, other streams wait
  • Java 21 and parallelStream — what changed? — ForkJoinPool.commonPool() behavior changed, need to test

Red flags (DO NOT say):

  • “parallelStream will speed up DB queries” — no, I/O blocks threads and slows down the entire application
  • “No need to test — parallelism is always faster” — always JMH on real data
  • “-D ForkJoinPool.common.parallelism affects only my stream” — it affects the ENTIRE application
  • “parallelStream is good for everything” — only for CPU-bound operations on collections

Related topics:

  • [[9. What are parallel streams]]
  • [[1. What advantages does Stream API provide]]
  • [[5. What does collect() operation do]]
  • [[2. What is the difference between intermediate and terminal operations]]