What does collect() operation do?
The most common case — collecting into a List:
Junior Level
collect() is a terminal (final) operation that triggers the entire stream pipeline and packs the results into the structure you need: List, Map, String, etc.
“Terminal” means: after collect() the stream is exhausted, it cannot be reused. “Collects” = puts elements into a container according to a given rule.
The most common case — collecting into a List:
List<String> result = stream.collect(Collectors.toList());
// Into a Set (removes duplicates)
Set<String> unique = stream.collect(Collectors.toSet());
// Into a string with delimiter
String joined = stream.collect(Collectors.joining(", "));
// Counting elements
long count = stream.collect(Collectors.counting());
Important: collect() triggers execution of all intermediate operations.
Middle Level
Anatomy of Collector<T, A, R>
A Collector consists of 4 functions:
supplier()— creates a new container (ArrayList::new)accumulator()— adds an element to the container (List::add)combiner()— merges two containers (needed forparallelStream)finisher()— final transformation
Mutable reduction
Mutable reduction — accumulating a result into a mutable container (ArrayList, StringBuilder). Unlike immutable reduction (reduce), where each step creates a new object. Mutable is more memory-efficient.
Unlike reduce(), which creates a new object at each step, collect() modifies an existing container. This is much more efficient for collections.
GroupingBy — “SQL inside Java”
// Group by city
Map<City, List<Person>> byCity = persons.stream()
.collect(groupingBy(Person::getCity));
// Grouping with counting
Map<City, Long> countByCity = persons.stream()
.collect(groupingBy(Person::getCity, counting()));
// Grouping with a nested collector
Map<City, Map<String, List<Person>>> complex = persons.stream()
.collect(groupingBy(Person::getCity, groupingBy(Person::getGender)));
ToMap with conflict resolution
Always use the three-argument version:
.collect(toMap(User::getId, u -> u, (existing, replacement) -> existing));
Without a merge function, a collision will result in IllegalStateException.
When NOT to use collect
- Simple counting — use
count()instead ofcollect(toList()).size() - Existence check —
anyMatch()instead ofcollect(toList())and checking isEmpty() - Finding one element —
findFirst()/findAny()instead of collect + get(0)
Senior Level
Collector Characteristics
IDENTITY_FINISH— the finisher method can be skippedUNORDERED— element order does not matter (faster in parallel)CONCURRENT— container is thread-safe (ConcurrentHashMap), allows multiple threads to write to the same container without acombiner()
Parallel Stream Combiner
When writing a custom collector, never ignore combiner(). Even if you are not currently using parallelStream(), someone might call it later — the code will break.
Performance
Do not use reduce() for collecting collections: reduce with a mutable container (new ArrayList) will break parallel mode — the same list is used in all branches. collect creates a separate container for each branch.
// BAD — O(n^2), copies the entire list at each step
stream.reduce(new ArrayList<>(), (list, e) -> { list.add(e); return list; }, ...)
// GOOD — O(n), adds to the existing container
stream.collect(toList())
Immutable Collections: Java 16+ has a .toList() method directly on the stream — it is more efficient and returns an unmodifiable list.
Edge Cases
- Null Values: Some collectors (
toMap,TreeMap) throw NPE onnullkeys or values - Memory Consumption: Collecting 1 million objects — peak memory consumption moment
Diagnostics
Profile the accumulator method in custom collectors — this is the most frequently called part (Hot Path).
Interview Cheat Sheet
Must know:
collect()— terminal operation, triggers the entire pipeline and packs the result into a data structure- Collector consists of 4 functions: supplier, accumulator, combiner, finisher
- Mutable reduction is more efficient than immutable — modifies a container rather than creating a new object
groupingBy— a powerful tool: grouping, counting, nested aggregations- Always use
toMapwith a merge function — otherwise IllegalStateException on duplicates - Java 16+:
stream.toList()is preferred — more concise, returns an unmodifiable list
Common follow-up questions:
- collect vs reduce for collections? — collect is more efficient: reduce with a mutable container breaks parallel mode
- What are Characteristics? — Optimization flags: IDENTITY_FINISH, UNORDERED, CONCURRENT
- When NOT to use collect()? — For simple counting (count()), existence check (anyMatch()), finding one element (findFirst())
- What does combiner do? — Merges two containers, critical for parallelStream
Red flags (DO NOT say):
- “collect can be called multiple times on the same stream” — no, the stream is exhausted after a terminal operation
- “reduce with new ArrayList is just as good as collect” — no, O(n^2) and breaks parallelism
- “toMap without a merge function is ok” — no, crash on key duplicates
- “combiner is not needed if I don’t use parallelStream” — someone will call it later, the code will break
Related topics:
- [[6. What is Collector and what built-in Collectors exist]]
- [[2. What is the difference between intermediate and terminal operations]]
- [[9. What are parallel streams]]
- [[3. What does filter() operation do]]