Question 18 · Section 8

What is the difference between reduce() and collect()?

Both operations collapse a stream into a result, but they do it differently:

Language versions: English Russian Ukrainian

🟢 Junior Level

Both operations collapse a stream into a result, but they do it differently:

reduce() — creates a new object at each step:

// Summation — each step creates a new number
int sum = numbers.stream().reduce(0, (a, b) -> a + b);

collect() — mutates the same object:

// Adding to the same list
List<String> list = strings.stream().collect(Collectors.toList());

Simple rule:

  • Numbers, strings → reduce()
  • Collections (List, Set, Map) → collect()

🟡 Middle Level

Type of reduction

reduce() — Immutable Reduction:

  • A new object is created at each step
  • Example: (String a, String b) -> a + b creates a new string, copying the contents of both

collect() — Mutable Reduction:

  • An existing container is modified at each step
  • Example: StringBuilder.append() appends to an already allocated buffer

Algorithmic complexity

When collecting into a collection:

  • reduce: O(n²) for string concatenation. Each concatenation a + b copies all characters from both strings. For n strings of length 1: step 1 copies 1 char, step 2 — 2… Total 1+2+3+…+n = n(n+1)/2 = O(n²). reduce can technically collect a List: reduce(new ArrayList<>(), (list, el) -> { list.add(el); return list; }, ...). But this will break parallel mode: mutable identity is used across all branches simultaneously. Therefore — only collect (creates a separate container for each branch).
  • collect: O(n) — simply adds a reference to ArrayList

Decision Matrix

Task Operation
Numbers, primitives reduce() or sum(), count()
String concatenation collect(joining())
List, Set, Map Only collect()
Complex DTOs collect()

🔴 Senior Level

Concurrency and Combiner

Characteristic reduce() collect()
Memory High consumption (intermediate objects) Low (containers are reused)
Combiner Combines two values Combines two containers
Optimization Difficult due to allocations Characteristics.CONCURRENT allows writing to one container

Identity Value difference

  • In reduce: The initial value is reused across parallel branches
  • In collect: Each branch gets its own container via supplier()

Side Effects

  • In collect(), container mutation is the basis of operation
  • In reduce(), mutating accumulator input parameters is a serious error that breaks parallel logic

Diagnostics

Profiling: If you see huge char[] allocation when working with strings — you are using reduce where collect(joining()) is needed.

Unit Testing: Always run custom collectors on parallelStream() with a small batch to verify the combiner.


🎯 Interview Cheat Sheet

Must know:

  • reduce() — immutable reduction: creates a new object at each step. collect() — mutable reduction: modifies a single container
  • Algorithmic complexity: reduce for strings O(n²) (copying characters), collect O(n) (adding reference)
  • Golden rule: numbers/strings → reduce(), collections (List, Set, Map) → collect()
  • In reduce, identity is reused across parallel branches; in collect, each branch gets its own container via supplier()
  • collect supports Characteristics.CONCURRENT — multiple threads write to one container, reduce — never
  • Mutating accumulator input parameters in reduce is a serious error that breaks parallel mode
  • Decision Matrix: primitives → reduce/sum/count, String → collect(joining()), List/Set/Map → only collect()

Frequent follow-up questions:

  • Why is reduce for string concatenation O(n²)? — Each concatenation a + b copies all characters from both strings; sum 1+2+3+…+n = n(n+1)/2
  • Can reduce collect a List? — Technically yes, but it will break parallel mode (mutable identity in branches). Use collect
  • What is the difference between identity in reduce vs supplier in collect? — identity in reduce is a single object reused across all branches. supplier in collect creates a separate container for each branch
  • How to diagnose reduce/collect misuse? — Profiling: huge char[] allocation when working with strings = need collect(joining()) instead of reduce

Red flags (DO NOT say):

  • “reduce and collect are the same thing, just different names” — fundamentally different approach: immutable vs mutable reduction
  • “You can mutate identity in reduce for optimization” — this will break parallel mode and give wrong results
  • “collect does not support parallelism” — it does via combiner and CONCURRENT characteristic
  • “reduce is always slower than collect” — for numbers and immutable objects the difference is negligible; collect is faster for collections

Related topics:

  • [[What does reduce() operation do]]
  • [[What does collect() operation do]]
  • [[What is Collector and what built-in Collectors exist]]
  • [[What potential problems can occur with parallel streams]]