Question 1 · Section 8

What advantages does Stream API provide?

Stream does not store data — it describes computations over data. The key component is the Spliterator (Splitable Iterator):

Language versions: English Russian Ukrainian

Junior Level

Stream API is a way to describe a sequence of computations over data without actually storing it.

Unlike a collection (which stores elements in memory), a stream is a lazily computed pipeline of operations. You describe WHAT to do (filter, map, collect), and Stream decides HOW to execute it.

Key advantages:

  • Readability: Code becomes declarative — you describe “what” to do, not “how”
  • Fewer bugs: Reduces the likelihood of loop errors (off-by-one, wrong indices)
  • Operation chaining: Complex transformations can be expressed in a single chain
// Old approach
List<String> result = new ArrayList<>();
for (String s : list) {
    if (s.startsWith("A")) {
        result.add(s.toUpperCase());
    }
}

// Stream API
List<String> result = list.stream()
    .filter(s -> s.startsWith("A"))
    .map(String::toUpperCase)
    .collect(Collectors.toList());

When NOT to use Stream API

  1. Simple operations on small collections (< 100 elements) — regular for/for-each is simpler and faster
  2. Latency-critical code — pipeline overhead is ~microseconds, but noticeable in hot paths
  3. Working with primitives without boxing — use IntStream/LongStream, not Stream

Middle Level

How Stream API works internally

Stream does not store data — it describes computations over data. The key component is the Spliterator (Splitable Iterator):

  • Can split data into parts for parallel processing
  • Reports source characteristics: ORDERED, DISTINCT, SORTED, SIZED

Spliterator characteristics:

  • ORDERED — elements have a defined order
  • DISTINCT — all elements are unique
  • SORTED — elements are sorted
  • SIZED — exact size is known (ArrayList, arrays). Critical for parallelism!

Operation types

Intermediate — lazy:

  • filter, map, flatMap, peek — stateless
  • sorted, distinct, limit, skip — stateful (require buffering)

Terminal — trigger execution:

  • collect, reduce, forEach, count, findFirst, anyMatch

Lazy Evaluation

Intermediate operations are not executed until a terminal operation is called. This allows optimization and working with infinite streams.

Senior Level

Internal vs External Iteration

External Iteration (for-each, Iterator):

  • You control “how” to iterate
  • JIT compiler is limited in optimizations

Internal Iteration (Stream API):

  • JIT can apply Loop Unrolling and Vectorization (SIMD)
  • Stream API itself decides the optimal execution order

Performance and Highload

Primitive Streams:

// BAD — creates millions of Integer objects
list.stream().map(String::length).reduce(0, Integer::sum)

// GOOD — works with primitives
list.stream().mapToInt(String::length).sum()

Fusion: Modern implementations combine several intermediate operations into a single pass.

When NOT to use:

  • Simple loops on small collections (up to ~1000 elements) — for-i is faster since there is no overhead of creating a pipeline and Spliterator. On large datasets the difference smooths out.
  • Complex side effects (code becomes unreadable)
  • I/O operations (block ForkJoinPool)

Parallelism — pitfalls

.parallel() uses the shared ForkJoinPool.commonPool() — uncontrolled use can slow down the entire application. Empirical rule: parallelism pays off when N * Q > 10,000.

Diagnostics

  • Use peek() only for debugging
  • IntelliJ Stream Debugger visualizes data flow
  • JMH for parallel stream benchmarks

Interview Cheat Sheet

Must know:

  • Stream API — a declarative way to describe computations over data, does not store data in memory
  • Key advantages: readability, fewer loop errors, convenient operation chaining
  • Intermediate operations are lazy, terminal — trigger execution
  • Spliterator — key component for data splitting and parallelism
  • Primitive Streams (IntStream, LongStream) are more efficient than boxed versions
  • .parallel() uses the shared ForkJoinPool.commonPool()
  • Do not use on small collections and in I/O operations

Common follow-up questions:

  • When is Stream API worse than for-each? — On collections < 100 elements, in hot paths with critical latency, for simple operations
  • What is Fusion? — JVM combines several intermediate operations into a single pass
  • Why is parallelStream dangerous? — Shared ForkJoinPool, blocking threads kills the performance of the entire application
  • How to diagnose problems? — JMH for benchmarks, IntelliJ Stream Debugger, peek() for debugging

Red flags (DO NOT say):

  • “Stream API is always faster than for-each” — no, pipeline overhead is noticeable on small data
  • “ParallelStream will speed up any task” — only CPU-bound with N*Q > 10,000
  • “Stream stores data” — no, it is a computation pipeline, data is in the source
  • “Stream can be reused” — no, IllegalStateException on reuse

Related topics:

  • [[2. What is the difference between intermediate and terminal operations]]
  • [[5. What does collect() operation do]]
  • [[9. What are parallel streams]]
  • [[10. When to use parallel streams]]
  • [[4. What does map() operation do]]