What is Collector and what built-in Collectors exist?
Usage example:
Junior Level
Collector is a recipe for collect(). It describes four steps: how to create a container (supplier), how to add an element (accumulator), how to merge two containers (combiner), and how to transform the result (finisher).
Main built-in Collectors:
// Into a list
Collectors.toList()
// Into a set (removes duplicates)
Collectors.toSet()
// Into a map
Collectors.toMap(keyMapper, valueMapper)
// Grouping
Collectors.groupingBy(classifier)
// String joining
Collectors.joining(", ")
// Counting
Collectors.counting()
// Sum
Collectors.summingInt(User::getAge)
// Average
Collectors.averagingInt(User::getAge)
Usage example:
List<String> names = users.stream()
.map(User::getName)
.collect(Collectors.toList());
Middle Level
Anatomy of Collector<T, A, R>
supplier()— creates the accumulator (ArrayList::new)accumulator()— adds an element (List::add)combiner()— merges accumulators (forparallelStream)finisher()— final transformationcharacteristics()— optimization flags
Collector Characteristics:
- CONCURRENT — accumulator can be called from different threads (faster in parallelStream)
- UNORDERED — element order does not matter (less synchronization during merge)
- IDENTITY_FINISH — finisher is not needed, container = result (less overhead)
Advanced Collectors
GroupingBy with Downstream:
// Grouping + counting
Map<City, Long> countByCity = persons.stream()
.collect(groupingBy(Person::getCity, counting()));
// Grouping + aggregation
Map<Department, Double> avgSalary = employees.stream()
.collect(groupingBy(Employee::getDept, averagingDouble(Employee::getSalary)));
// Partitioning — split into true/false
Map<Boolean, List<User>> partitioned = users.stream()
.collect(partitioningBy(User::isActive));
ToMap collision trap:
// BAD — will throw IllegalStateException on duplicates
.toMap(User::getId, u -> u)
// GOOD — with merge function
.toMap(User::getId, u -> u, (old, replacement) -> old)
Teeing (Java 12+):
// Two collectors + merging results
var result = stream.collect(teeing(
minBy(Comparator.naturalOrder()),
maxBy(Comparator.naturalOrder()),
(min, max) -> new Range(min, max)
));
When NOT to use a custom Collector
- Simple accumulation into List/Set — use
toList(),toSet()(Java 16+:toList()is immutable) - Single-level grouping —
groupingBy()covers 95% of cases - Your Collector is more complex than the alternative — sometimes two passes are simpler
Senior Level
Characteristics flags
CONCURRENT— accumulator is thread-safe. In a parallel stream, threads write to the same instance, bypassing the expensivecombiner().UNORDERED— collector does not preserve order (faster forSetand parallel grouping)IDENTITY_FINISH— accumulation result casts directly toR, skippingfinisher()
GroupingByConcurrent
For parallel streams, use groupingByConcurrent — uses ConcurrentMap and the CONCURRENT flag, which on huge data works several times faster than regular groupingBy.
Performance
Immutable Collectors: toUnmodifiableList() (Java 10) is more efficient than collect(toList()) followed by wrapping.
Custom Collector Cost: accumulator is called for every element. Any extra allocation is death for GC under load.
toList() vs collect(Collectors.toList()): Starting with Java 16, stream.toList() is preferred — more concise, returns an unmodifiable list, optimized internally.
Diagnostics
If collect is slow in parallel streams — check the combiner. A bad implementation (list1.addAll(list2)) can negate all parallelism advantages.
Interview Cheat Sheet
Must know:
- Collector — a “recipe” for collect(): supplier, accumulator, combiner, finisher + characteristics
- Characteristics: CONCURRENT (thread-safe accumulator), UNORDERED (no order preservation), IDENTITY_FINISH (finisher not needed)
- Main built-in: toList, toSet, toMap, groupingBy, partitioningBy, joining, counting, summingXxx, averagingXxx
groupingBywith downstream — grouping + counting/aggregation in a single pass- Teeing (Java 12+) — two collectors + merging results
- groupingByConcurrent for parallel streams — faster than regular groupingBy
Common follow-up questions:
- toMap without merge function — what happens? — IllegalStateException on key duplicate
- When is a custom Collector NOT needed? — Simple accumulation (toList/toSet), grouping (groupingBy covers 95%)
- Why is a CONCURRENT collector better? — In parallelStream threads write to one container, bypassing the expensive combiner
- toList() vs collect(Collectors.toList())? — Java 16+: toList() is more concise, returns an immutable list
Red flags (DO NOT say):
- “Collector stores data” — no, it is only a description of assembly rules
- “combiner can be ignored” — no, parallelStream will break
- “groupingBy is always slow” — no, groupingByConcurrent solves the parallelism problem
- “IDENTITY_FINISH means finisher is called once” — no, it is not called at all
Related topics:
- [[5. What does collect() operation do]]
- [[9. What are parallel streams]]
- [[2. What is the difference between intermediate and terminal operations]]
- [[7. What does flatMap() operation do]]