Question 4 · Section 10

What is a Collision in HashMap?

This happens because the number of possible keys is infinite, while the number of buckets is limited. Imagine a parking lot with 16 spots: if 20 cars arrive, at least 4 spots ar...

Language versions: English Russian Ukrainian

🟢 Junior Level

Collision is when two different keys land in the same bucket in HashMap.

This is NOT a bug! It’s expected behavior. Like two students in a school having the same last name — it doesn’t mean they’re the same person. HashMap is designed to handle collisions.

This happens because the number of possible keys is infinite, while the number of buckets is limited. Imagine a parking lot with 16 spots: if 20 cars arrive, at least 4 spots are claimed by 2 cars.

Collision example:

HashMap<String, String> map = new HashMap<>();
map.put("Aa", "value1");  // hashCode = 2112
map.put("BB", "value2");  // hashCode = 2112 (same!)

// Why: String.hashCode() formula: s[0]*31^(n-1) + s[1]*31^(n-2)
// "Aa": 65*31 + 97 = 2112
// "BB": 66*31 + 66 = 2112
// Different characters, but mathematically give the same result
// Both landed in the same bucket — this is a collision

Good news: HashMap handles collisions — it stores multiple elements in one bucket as a chain.

🟡 Middle Level

Why Collisions Are Inevitable

  1. Pigeonhole principle: There are 2^32 (~4.3 billion) hash codes, but infinitely many possible objects
  2. Birthday paradox: In a table of 365 buckets, collision probability > 50% with just 23 elements
  3. Range compression: Even different hashCodes may collide after & (n-1)

Types of Collisions

Collision Type Cause Solution
hashCode collision Two objects have the same hashCode() (rare with a good function) Treeification at 8 elements (Java 8+)
Index collision DIFFERENT hashCodes, but (n-1) & hash is the same (much more common) Resize (table doubling)

Index collision example: hashCode = 100 and hashCode = 116 with n=16 both give index 4, because 100 & 15 = 4 and 116 & 15 = 4.

Resolution Strategies

  • Separate Chaining — used in HashMap, LinkedHashMap, ConcurrentHashMap
  • Open Addressing — used in ThreadLocalMap, IdentityHashMap

Evolution

Version Structure Complexity
Java 7 Linked list O(n)
Java 8+ List → Tree O(log n)

Common Mistakes

  1. Returning a constant from hashCode() — all elements in one bucket
  2. Using only one field in hashCode() — many collisions with different objects

🔴 Senior Level

Internal Mechanics

Collision is determined at the bucket level:

// hash(k1) & (n-1) == hash(k2) & (n-1)
// But k1.equals(k2) == false

HashMap uses two-level filtering:

  1. Fast filter — hash comparison (p.hash == hash), filters 99.9%
  2. Precise checkequals() call for remaining candidates

Treeification Details

At 8 elements in a bucket:

  • If capacity < 64: resize() is called (spreads elements apart)
  • If capacity >= 64: list → red-black tree

Why 8? The probability of a chain of 8 elements with loadFactor 0.75 and a good hashCode: 0.00000006 (Poisson distribution). This is an anomaly marker — either a bad hashCode or an attack.

CPU and Cache Impact

Collisions affect performance more than it seems:

  • Each equals() is a method call + field comparison
  • Nodes are scattered across Heap → Cache Miss on every transition
  • On treeification, tree balancing overhead is added

Hash Flooding Attack

Targeted DoS attack:

  1. Attacker generates keys with the same hash
  2. All elements land in one bucket
  3. Java 7: O(n²) on inserting n elements → 100% CPU
  4. Java 8+: O(n log n) — attack mitigated, but not fully eliminated

Memory Implications

Each collision = an additional Node object in Heap. With mass collisions:

  • Young Gen GC pressure increases
  • On treeification: TreeNode ~ 2x the size of Node
  • Old Gen accumulates long-lived nodes

Production Monitoring

To detect problems:

  • Monitor get/put times — growth indicates collisions
  • Analyze bucket distribution via JFR (Java Flight Recorder)
  • Check hashCode() quality via unit tests on distribution

🎯 Interview Cheat Sheet

Must know:

  • Collision = two different keys landed in the same bucket — this is normal, not a bug
  • Pigeonhole principle: infinitely many keys, limited buckets — collisions are inevitable
  • Java 7: O(n) on collisions, Java 8+: O(log n) thanks to treeification
  • Birthday paradox: with 5 elements and 16 buckets, collision probability > 50%
  • Classic example: “Aa” and “BB” have the same hashCode = 2112
  • Two collision types: same hashCode AND different hashCodes but same index after (n-1) & hash
  • Threshold 8 was chosen based on Poisson distribution — probability ≈ 0.00000006

Common follow-up questions:

  • Why are collisions inevitable? — hashCode = 32 bits (~4.3 billion), but infinitely many possible objects
  • How does Separate Chaining differ from Open Addressing? — chaining = lists in buckets, open addressing = probing for next free slot
  • What is a Hash Flooding Attack? — attacker crafts keys with the same hash → DoS
  • Why is the probability of 8 collisions so small? — Poisson distribution at loadFactor 0.75

Red flags (DO NOT say):

  • “Collision is a bug” — no, expected behavior
  • “HashMap doesn’t support collisions” — it does via chains/trees
  • “hashCode is always unique” — no, 32 bits can’t cover an infinite set

Related topics:

  • [[03. How HashMap Determines Which Bucket to Put an Element In]]
  • [[05. How HashMap Handles Collisions]]
  • [[06. What Happens When 8 Elements Are Reached in One Bucket]]
  • [[09. If Two Objects Have the Same hashCode(), Are They Necessarily Equal by equals()]]