What is the difference between reduce() and collect()?

🟢 Junior Level

Both operations collapse a stream into a result, but they do it differently:

reduce() — creates a new object at each step:

// Summation — each step creates a new number
int sum = numbers.stream().reduce(0, (a, b) -> a + b);

collect() — mutates the same object:

// Adding to the same list
List<String> list = strings.stream().collect(Collectors.toList());

Simple rule:

reduce() — Immutable Reduction:

A new object is created at each step
Example: (String a, String b) -> a + b creates a new string, copying the contents of both

collect() — Mutable Reduction:

When collecting into a collection:

reduce: O(n²) for string concatenation. Each concatenation a + b copies all characters from both strings. For n strings of length 1: step 1 copies 1 char, step 2 — 2… Total 1+2+3+…+n = n(n+1)/2 = O(n²). reduce can technically collect a List: reduce(new ArrayList<>(), (list, el) -> { list.add(el); return list; }, ...). But this will break parallel mode: mutable identity is used across all branches simultaneously. Therefore — only collect (creates a separate container for each branch).
collect: O(n) — simply adds a reference to ArrayList

Characteristic	reduce()	collect()
Memory	High consumption (intermediate objects)	Low (containers are reused)
Combiner	Combines two values	Combines two containers
Optimization	Difficult due to allocations	`Characteristics.CONCURRENT` allows writing to one container

In collect(), container mutation is the basis of operation
In reduce(), mutating accumulator input parameters is a serious error that breaks parallel logic

Profiling: If you see huge char[] allocation when working with strings — you are using reduce where collect(joining()) is needed.

Unit Testing: Always run custom collectors on parallelStream() with a small batch to verify the combiner.

Must know:

reduce() — immutable reduction: creates a new object at each step. collect() — mutable reduction: modifies a single container
Algorithmic complexity: reduce for strings O(n²) (copying characters), collect O(n) (adding reference)
Golden rule: numbers/strings → reduce(), collections (List, Set, Map) → collect()
In reduce, identity is reused across parallel branches; in collect, each branch gets its own container via supplier()
collect supports Characteristics.CONCURRENT — multiple threads write to one container, reduce — never
Mutating accumulator input parameters in reduce is a serious error that breaks parallel mode
Decision Matrix: primitives → reduce/sum/count, String → collect(joining()), List/Set/Map → only collect()

Frequent follow-up questions:

Why is reduce for string concatenation O(n²)? — Each concatenation a + b copies all characters from both strings; sum 1+2+3+…+n = n(n+1)/2
Can reduce collect a List? — Technically yes, but it will break parallel mode (mutable identity in branches). Use collect
What is the difference between identity in reduce vs supplier in collect? — identity in reduce is a single object reused across all branches. supplier in collect creates a separate container for each branch
How to diagnose reduce/collect misuse? — Profiling: huge char[] allocation when working with strings = need collect(joining()) instead of reduce

Red flags (DO NOT say):

“reduce and collect are the same thing, just different names” — fundamentally different approach: immutable vs mutable reduction
“You can mutate identity in reduce for optimization” — this will break parallel mode and give wrong results
“collect does not support parallelism” — it does via combiner and CONCURRENT characteristic
“reduce is always slower than collect” — for numbers and immutable objects the difference is negligible; collect is faster for collections

Related topics: