What is the difference between reduce() and collect()?
Both operations collapse a stream into a result, but they do it differently:
🟢 Junior Level
Both operations collapse a stream into a result, but they do it differently:
reduce() — creates a new object at each step:
// Summation — each step creates a new number
int sum = numbers.stream().reduce(0, (a, b) -> a + b);
collect() — mutates the same object:
// Adding to the same list
List<String> list = strings.stream().collect(Collectors.toList());
Simple rule:
- Numbers, strings →
reduce() - Collections (List, Set, Map) →
collect()
🟡 Middle Level
Type of reduction
reduce() — Immutable Reduction:
- A new object is created at each step
- Example:
(String a, String b) -> a + bcreates a new string, copying the contents of both
collect() — Mutable Reduction:
- An existing container is modified at each step
- Example:
StringBuilder.append()appends to an already allocated buffer
Algorithmic complexity
When collecting into a collection:
reduce: O(n²) for string concatenation. Each concatenationa + bcopies all characters from both strings. For n strings of length 1: step 1 copies 1 char, step 2 — 2… Total 1+2+3+…+n = n(n+1)/2 = O(n²). reduce can technically collect a List:reduce(new ArrayList<>(), (list, el) -> { list.add(el); return list; }, ...). But this will break parallel mode: mutable identity is used across all branches simultaneously. Therefore — onlycollect(creates a separate container for each branch).collect: O(n) — simply adds a reference to ArrayList
Decision Matrix
| Task | Operation |
|---|---|
| Numbers, primitives | reduce() or sum(), count() |
| String concatenation | collect(joining()) |
| List, Set, Map | Only collect() |
| Complex DTOs | collect() |
🔴 Senior Level
Concurrency and Combiner
| Characteristic | reduce() | collect() |
|---|---|---|
| Memory | High consumption (intermediate objects) | Low (containers are reused) |
| Combiner | Combines two values | Combines two containers |
| Optimization | Difficult due to allocations | Characteristics.CONCURRENT allows writing to one container |
Identity Value difference
- In
reduce: The initial value is reused across parallel branches - In
collect: Each branch gets its own container viasupplier()
Side Effects
- In
collect(), container mutation is the basis of operation - In
reduce(), mutating accumulator input parameters is a serious error that breaks parallel logic
Diagnostics
Profiling: If you see huge char[] allocation when working with strings — you are using reduce where collect(joining()) is needed.
Unit Testing: Always run custom collectors on parallelStream() with a small batch to verify the combiner.
🎯 Interview Cheat Sheet
Must know:
reduce()— immutable reduction: creates a new object at each step.collect()— mutable reduction: modifies a single container- Algorithmic complexity:
reducefor strings O(n²) (copying characters),collectO(n) (adding reference) - Golden rule: numbers/strings →
reduce(), collections (List, Set, Map) →collect() - In
reduce, identity is reused across parallel branches; incollect, each branch gets its own container viasupplier() collectsupportsCharacteristics.CONCURRENT— multiple threads write to one container,reduce— never- Mutating accumulator input parameters in
reduceis a serious error that breaks parallel mode - Decision Matrix: primitives →
reduce/sum/count, String →collect(joining()), List/Set/Map → onlycollect()
Frequent follow-up questions:
- Why is
reducefor string concatenation O(n²)? — Each concatenationa + bcopies all characters from both strings; sum 1+2+3+…+n = n(n+1)/2 - Can reduce collect a List? — Technically yes, but it will break parallel mode (mutable identity in branches). Use
collect - What is the difference between identity in reduce vs supplier in collect? — identity in reduce is a single object reused across all branches. supplier in collect creates a separate container for each branch
- How to diagnose reduce/collect misuse? — Profiling: huge
char[]allocation when working with strings = needcollect(joining())instead ofreduce
Red flags (DO NOT say):
- “reduce and collect are the same thing, just different names” — fundamentally different approach: immutable vs mutable reduction
- “You can mutate identity in reduce for optimization” — this will break parallel mode and give wrong results
- “collect does not support parallelism” — it does via combiner and CONCURRENT characteristic
- “reduce is always slower than collect” — for numbers and immutable objects the difference is negligible; collect is faster for collections
Related topics:
- [[What does reduce() operation do]]
- [[What does collect() operation do]]
- [[What is Collector and what built-in Collectors exist]]
- [[What potential problems can occur with parallel streams]]