Question 5 · Section 8

What does collect() operation do?

The most common case — collecting into a List:

Language versions: English Russian Ukrainian

Junior Level

collect() is a terminal (final) operation that triggers the entire stream pipeline and packs the results into the structure you need: List, Map, String, etc.

“Terminal” means: after collect() the stream is exhausted, it cannot be reused. “Collects” = puts elements into a container according to a given rule.

The most common case — collecting into a List:

List<String> result = stream.collect(Collectors.toList());

// Into a Set (removes duplicates)
Set<String> unique = stream.collect(Collectors.toSet());

// Into a string with delimiter
String joined = stream.collect(Collectors.joining(", "));

// Counting elements
long count = stream.collect(Collectors.counting());

Important: collect() triggers execution of all intermediate operations.

Middle Level

Anatomy of Collector<T, A, R>

A Collector consists of 4 functions:

  1. supplier() — creates a new container (ArrayList::new)
  2. accumulator() — adds an element to the container (List::add)
  3. combiner() — merges two containers (needed for parallelStream)
  4. finisher() — final transformation

Mutable reduction

Mutable reduction — accumulating a result into a mutable container (ArrayList, StringBuilder). Unlike immutable reduction (reduce), where each step creates a new object. Mutable is more memory-efficient.

Unlike reduce(), which creates a new object at each step, collect() modifies an existing container. This is much more efficient for collections.

GroupingBy — “SQL inside Java”

// Group by city
Map<City, List<Person>> byCity = persons.stream()
    .collect(groupingBy(Person::getCity));

// Grouping with counting
Map<City, Long> countByCity = persons.stream()
    .collect(groupingBy(Person::getCity, counting()));

// Grouping with a nested collector
Map<City, Map<String, List<Person>>> complex = persons.stream()
    .collect(groupingBy(Person::getCity, groupingBy(Person::getGender)));

ToMap with conflict resolution

Always use the three-argument version:

.collect(toMap(User::getId, u -> u, (existing, replacement) -> existing));

Without a merge function, a collision will result in IllegalStateException.

When NOT to use collect

  1. Simple counting — use count() instead of collect(toList()).size()
  2. Existence checkanyMatch() instead of collect(toList()) and checking isEmpty()
  3. Finding one elementfindFirst() / findAny() instead of collect + get(0)

Senior Level

Collector Characteristics

  • IDENTITY_FINISH — the finisher method can be skipped
  • UNORDERED — element order does not matter (faster in parallel)
  • CONCURRENT — container is thread-safe (ConcurrentHashMap), allows multiple threads to write to the same container without a combiner()

Parallel Stream Combiner

When writing a custom collector, never ignore combiner(). Even if you are not currently using parallelStream(), someone might call it later — the code will break.

Performance

Do not use reduce() for collecting collections: reduce with a mutable container (new ArrayList) will break parallel mode — the same list is used in all branches. collect creates a separate container for each branch.

// BAD — O(n^2), copies the entire list at each step
stream.reduce(new ArrayList<>(), (list, e) -> { list.add(e); return list; }, ...)

// GOOD — O(n), adds to the existing container
stream.collect(toList())

Immutable Collections: Java 16+ has a .toList() method directly on the stream — it is more efficient and returns an unmodifiable list.

Edge Cases

  • Null Values: Some collectors (toMap, TreeMap) throw NPE on null keys or values
  • Memory Consumption: Collecting 1 million objects — peak memory consumption moment

Diagnostics

Profile the accumulator method in custom collectors — this is the most frequently called part (Hot Path).


Interview Cheat Sheet

Must know:

  • collect() — terminal operation, triggers the entire pipeline and packs the result into a data structure
  • Collector consists of 4 functions: supplier, accumulator, combiner, finisher
  • Mutable reduction is more efficient than immutable — modifies a container rather than creating a new object
  • groupingBy — a powerful tool: grouping, counting, nested aggregations
  • Always use toMap with a merge function — otherwise IllegalStateException on duplicates
  • Java 16+: stream.toList() is preferred — more concise, returns an unmodifiable list

Common follow-up questions:

  • collect vs reduce for collections? — collect is more efficient: reduce with a mutable container breaks parallel mode
  • What are Characteristics? — Optimization flags: IDENTITY_FINISH, UNORDERED, CONCURRENT
  • When NOT to use collect()? — For simple counting (count()), existence check (anyMatch()), finding one element (findFirst())
  • What does combiner do? — Merges two containers, critical for parallelStream

Red flags (DO NOT say):

  • “collect can be called multiple times on the same stream” — no, the stream is exhausted after a terminal operation
  • “reduce with new ArrayList is just as good as collect” — no, O(n^2) and breaks parallelism
  • “toMap without a merge function is ok” — no, crash on key duplicates
  • “combiner is not needed if I don’t use parallelStream” — someone will call it later, the code will break

Related topics:

  • [[6. What is Collector and what built-in Collectors exist]]
  • [[2. What is the difference between intermediate and terminal operations]]
  • [[9. What are parallel streams]]
  • [[3. What does filter() operation do]]