What is Collector and what built-in Collectors exist?

Junior Level

Collector is a recipe for collect(). It describes four steps: how to create a container (supplier), how to add an element (accumulator), how to merge two containers (combiner), and how to transform the result (finisher).

Main built-in Collectors:

// Into a list
Collectors.toList()

// Into a set (removes duplicates)
Collectors.toSet()

// Into a map
Collectors.toMap(keyMapper, valueMapper)

// Grouping
Collectors.groupingBy(classifier)

// String joining
Collectors.joining(", ")

// Counting
Collectors.counting()

// Sum
Collectors.summingInt(User::getAge)

// Average
Collectors.averagingInt(User::getAge)

Usage example:

List<String> names = users.stream()
    .map(User::getName)
    .collect(Collectors.toList());

Middle Level

Anatomy of Collector<T, A, R>

supplier() — creates the accumulator (ArrayList::new)
accumulator() — adds an element (List::add)
combiner() — merges accumulators (for parallelStream)
finisher() — final transformation
characteristics() — optimization flags

Collector Characteristics:

CONCURRENT — accumulator can be called from different threads (faster in parallelStream)
UNORDERED — element order does not matter (less synchronization during merge)
IDENTITY_FINISH — finisher is not needed, container = result (less overhead)

Advanced Collectors

GroupingBy with Downstream:

// Grouping + counting
Map<City, Long> countByCity = persons.stream()
    .collect(groupingBy(Person::getCity, counting()));

// Grouping + aggregation
Map<Department, Double> avgSalary = employees.stream()
    .collect(groupingBy(Employee::getDept, averagingDouble(Employee::getSalary)));

// Partitioning — split into true/false
Map<Boolean, List<User>> partitioned = users.stream()
    .collect(partitioningBy(User::isActive));

ToMap collision trap:

// BAD — will throw IllegalStateException on duplicates
.toMap(User::getId, u -> u)

// GOOD — with merge function
.toMap(User::getId, u -> u, (old, replacement) -> old)

Teeing (Java 12+):

// Two collectors + merging results
var result = stream.collect(teeing(
    minBy(Comparator.naturalOrder()),
    maxBy(Comparator.naturalOrder()),
    (min, max) -> new Range(min, max)
));

When NOT to use a custom Collector

Simple accumulation into List/Set — use toList(), toSet() (Java 16+: toList() is immutable)
Single-level grouping — groupingBy() covers 95% of cases
Your Collector is more complex than the alternative — sometimes two passes are simpler

Senior Level

Characteristics flags

CONCURRENT — accumulator is thread-safe. In a parallel stream, threads write to the same instance, bypassing the expensive combiner().
UNORDERED — collector does not preserve order (faster for Set and parallel grouping)
IDENTITY_FINISH — accumulation result casts directly to R, skipping finisher()

GroupingByConcurrent

For parallel streams, use groupingByConcurrent — uses ConcurrentMap and the CONCURRENT flag, which on huge data works several times faster than regular groupingBy.

Performance

Immutable Collectors: toUnmodifiableList() (Java 10) is more efficient than collect(toList()) followed by wrapping.

Custom Collector Cost: accumulator is called for every element. Any extra allocation is death for GC under load.

toList() vs collect(Collectors.toList()): Starting with Java 16, stream.toList() is preferred — more concise, returns an unmodifiable list, optimized internally.

Diagnostics

If collect is slow in parallel streams — check the combiner. A bad implementation (list1.addAll(list2)) can negate all parallelism advantages.

Interview Cheat Sheet

Must know:

Collector — a “recipe” for collect(): supplier, accumulator, combiner, finisher + characteristics
Characteristics: CONCURRENT (thread-safe accumulator), UNORDERED (no order preservation), IDENTITY_FINISH (finisher not needed)
Main built-in: toList, toSet, toMap, groupingBy, partitioningBy, joining, counting, summingXxx, averagingXxx
groupingBy with downstream — grouping + counting/aggregation in a single pass
Teeing (Java 12+) — two collectors + merging results
groupingByConcurrent for parallel streams — faster than regular groupingBy

Common follow-up questions:

toMap without merge function — what happens? — IllegalStateException on key duplicate
When is a custom Collector NOT needed? — Simple accumulation (toList/toSet), grouping (groupingBy covers 95%)
Why is a CONCURRENT collector better? — In parallelStream threads write to one container, bypassing the expensive combiner
toList() vs collect(Collectors.toList())? — Java 16+: toList() is more concise, returns an immutable list

Red flags (DO NOT say):

“Collector stores data” — no, it is only a description of assembly rules
“combiner can be ignored” — no, parallelStream will break
“groupingBy is always slow” — no, groupingByConcurrent solves the parallelism problem
“IDENTITY_FINISH means finisher is called once” — no, it is not called at all

Related topics:

[[5. What does collect() operation do]]
[[9. What are parallel streams]]
[[2. What is the difference between intermediate and terminal operations]]
[[7. What does flatMap() operation do]]