Question 28 · Section 8

What to do about key collisions when collecting into Map?

By default, toMap throws an error:

Language versions: English Russian Ukrainian

🟢 Junior Level

Key collision — when two stream elements produce the same key for a Map.

By default, toMap throws an error:

// If two users have the same id → IllegalStateException
users.stream().collect(toMap(User::getId, u -> u));

Solution: Add a third parameter — the merge function:

users.stream().collect(toMap(
    User::getId,
    u -> u,
    (existing, replacement) -> existing  // keep first
));

Merge function options:

  • (old, newVal) -> old — keep the first value
  • (old, newVal) -> newVal — replace with the new one
  • (old, newVal) -> old + newVal — combine

// (existing, replacement) -> existing — keep the FIRST value // (existing, replacement) -> replacement — keep the LAST value // (existing, replacement) -> existing + replacement — combine

🟡 Middle Level

Conflict resolution strategies

1. Overwriting:

// Keep First — for deduplication
(oldValue, newValue) -> oldValue

// Keep Last — current state
(oldValue, newValue) -> newValue

2. Aggregation (Collating):

// Concatenation
.toMap(User::getRole, User::getName, (n1, n2) -> n1 + ", " + n2)

3. Complex choice (Business Logic):

(existing, replacement) ->
    existing.getSalary() > replacement.getSalary() ? existing : replacement

When is toMap() a bad choice?

If one key should correspond to multiple values — use Collectors.groupingBy():

// Correct — creates Map<Role, List<User>>
users.stream().collect(groupingBy(User::getRole));

🔴 Senior Level

Merge Function Cost

The merge function is called in the critical section of collection. Heavy computations will slow down the entire stream.

Parallel Streams

In parallel streams, collisions are handled when merging sub-results (combiner):

  • Few collisions — overhead is negligible
  • Many collisions — better to use groupingByConcurrent
// In parallelStream, combiner is called to merge results from different workers.
// It must be compatible with mergeFunction, otherwise the result will be incorrect.

Null Values

toMap does not tolerate null values, even if the merge function handles them → NPE inside Map.merge.

Static Analysis

Error Prone (Google) and Sonar tools flag toMap without a 3rd argument as a “potential bug”. This is a safe coding standard.

Diagnostics

For critical code, wrap toMap in a block that logs conflicting objects on error:

.collect(toMap(
    User::getId, u -> u,
    (old, newVal) -> {
        log.warn("Duplicate key: {}, values: {} and {}", key, old, newVal);
        return old;
    }
))

🎯 Interview Cheat Sheet

Must know:

  • Key collision = two elements produce the same key → without merge function: IllegalStateException
  • Strategies: (old, new) -> old (keep first), (old, new) -> newVal (replace), combine
  • If one key needs multiple values — use groupingBy(), not toMap
  • In parallel streams, collisions are handled when merging sub-results (combiner)
  • Many collisions in parallelStream — better to use groupingByConcurrent with ConcurrentHashMap
  • toMap does not tolerate null values → NPE inside Map.merge, even with a merge function
  • Error Prone and Sonar flag toMap without a 3rd argument as a “potential bug”

Frequent follow-up questions:

  • How to keep the first value on collision?(existing, replacement) -> existing — keeps the first found element.
  • When to use groupingBy instead of toMap? — When one key corresponds to multiple values — groupingBy creates Map<K, List>.
  • Why must the merge function in parallelStream be associative? — Because the combiner may combine results in different order — a non-associative function gives a nondeterministic result.
  • How to log duplicates on collision? — Wrap the merge function in a block that logs the conflicting objects.

Red flags (DO NOT say):

  • “toMap without a merge function is safe” — incorrect, on duplicates it throws IllegalStateException
  • “merge function is called for every element” — incorrect, only on key match
  • “null values are acceptable in toMap” — incorrect, Map.merge will throw NPE
  • “groupingByConcurrent is always faster than toMap” — incorrect, it is only effective with many collisions

Related topics:

  • [[27. How to collect Stream into Map]]
  • [[21. What is lazy evaluation in Stream]]
  • [[22. When does Stream operation execution begin]]
  • [[29. How to work with Optional in Stream]]