What advantages does Stream API provide?
Stream does not store data — it describes computations over data. The key component is the Spliterator (Splitable Iterator):
Junior Level
Stream API is a way to describe a sequence of computations over data without actually storing it.
Unlike a collection (which stores elements in memory), a stream is a lazily computed pipeline of operations. You describe WHAT to do (filter, map, collect), and Stream decides HOW to execute it.
Key advantages:
- Readability: Code becomes declarative — you describe “what” to do, not “how”
- Fewer bugs: Reduces the likelihood of loop errors (off-by-one, wrong indices)
- Operation chaining: Complex transformations can be expressed in a single chain
// Old approach
List<String> result = new ArrayList<>();
for (String s : list) {
if (s.startsWith("A")) {
result.add(s.toUpperCase());
}
}
// Stream API
List<String> result = list.stream()
.filter(s -> s.startsWith("A"))
.map(String::toUpperCase)
.collect(Collectors.toList());
When NOT to use Stream API
- Simple operations on small collections (< 100 elements) — regular for/for-each is simpler and faster
- Latency-critical code — pipeline overhead is ~microseconds, but noticeable in hot paths
- Working with primitives without boxing — use IntStream/LongStream, not Stream
Middle Level
How Stream API works internally
Stream does not store data — it describes computations over data. The key component is the Spliterator (Splitable Iterator):
- Can split data into parts for parallel processing
- Reports source characteristics:
ORDERED,DISTINCT,SORTED,SIZED
Spliterator characteristics:
- ORDERED — elements have a defined order
- DISTINCT — all elements are unique
- SORTED — elements are sorted
- SIZED — exact size is known (ArrayList, arrays). Critical for parallelism!
Operation types
Intermediate — lazy:
filter,map,flatMap,peek— statelesssorted,distinct,limit,skip— stateful (require buffering)
Terminal — trigger execution:
collect,reduce,forEach,count,findFirst,anyMatch
Lazy Evaluation
Intermediate operations are not executed until a terminal operation is called. This allows optimization and working with infinite streams.
Senior Level
Internal vs External Iteration
External Iteration (for-each, Iterator):
- You control “how” to iterate
- JIT compiler is limited in optimizations
Internal Iteration (Stream API):
- JIT can apply Loop Unrolling and Vectorization (SIMD)
- Stream API itself decides the optimal execution order
Performance and Highload
Primitive Streams:
// BAD — creates millions of Integer objects
list.stream().map(String::length).reduce(0, Integer::sum)
// GOOD — works with primitives
list.stream().mapToInt(String::length).sum()
Fusion: Modern implementations combine several intermediate operations into a single pass.
When NOT to use:
- Simple loops on small collections (up to ~1000 elements) — for-i is faster since there is no overhead of creating a pipeline and Spliterator. On large datasets the difference smooths out.
- Complex side effects (code becomes unreadable)
- I/O operations (block ForkJoinPool)
Parallelism — pitfalls
.parallel() uses the shared ForkJoinPool.commonPool() — uncontrolled use can slow down the entire application. Empirical rule: parallelism pays off when N * Q > 10,000.
Diagnostics
- Use
peek()only for debugging - IntelliJ Stream Debugger visualizes data flow
- JMH for parallel stream benchmarks
Interview Cheat Sheet
Must know:
- Stream API — a declarative way to describe computations over data, does not store data in memory
- Key advantages: readability, fewer loop errors, convenient operation chaining
- Intermediate operations are lazy, terminal — trigger execution
- Spliterator — key component for data splitting and parallelism
- Primitive Streams (IntStream, LongStream) are more efficient than boxed versions
.parallel()uses the shared ForkJoinPool.commonPool()- Do not use on small collections and in I/O operations
Common follow-up questions:
- When is Stream API worse than for-each? — On collections < 100 elements, in hot paths with critical latency, for simple operations
- What is Fusion? — JVM combines several intermediate operations into a single pass
- Why is parallelStream dangerous? — Shared ForkJoinPool, blocking threads kills the performance of the entire application
- How to diagnose problems? — JMH for benchmarks, IntelliJ Stream Debugger, peek() for debugging
Red flags (DO NOT say):
- “Stream API is always faster than for-each” — no, pipeline overhead is noticeable on small data
- “ParallelStream will speed up any task” — only CPU-bound with N*Q > 10,000
- “Stream stores data” — no, it is a computation pipeline, data is in the source
- “Stream can be reused” — no, IllegalStateException on reuse
Related topics:
- [[2. What is the difference between intermediate and terminal operations]]
- [[5. What does collect() operation do]]
- [[9. What are parallel streams]]
- [[10. When to use parallel streams]]
- [[4. What does map() operation do]]