What is a Collision in HashMap?
This happens because the number of possible keys is infinite, while the number of buckets is limited. Imagine a parking lot with 16 spots: if 20 cars arrive, at least 4 spots ar...
🟢 Junior Level
Collision is when two different keys land in the same bucket in HashMap.
This is NOT a bug! It’s expected behavior. Like two students in a school having the same last name — it doesn’t mean they’re the same person. HashMap is designed to handle collisions.
This happens because the number of possible keys is infinite, while the number of buckets is limited. Imagine a parking lot with 16 spots: if 20 cars arrive, at least 4 spots are claimed by 2 cars.
Collision example:
HashMap<String, String> map = new HashMap<>();
map.put("Aa", "value1"); // hashCode = 2112
map.put("BB", "value2"); // hashCode = 2112 (same!)
// Why: String.hashCode() formula: s[0]*31^(n-1) + s[1]*31^(n-2)
// "Aa": 65*31 + 97 = 2112
// "BB": 66*31 + 66 = 2112
// Different characters, but mathematically give the same result
// Both landed in the same bucket — this is a collision
Good news: HashMap handles collisions — it stores multiple elements in one bucket as a chain.
🟡 Middle Level
Why Collisions Are Inevitable
- Pigeonhole principle: There are 2^32 (~4.3 billion) hash codes, but infinitely many possible objects
- Birthday paradox: In a table of 365 buckets, collision probability > 50% with just 23 elements
- Range compression: Even different hashCodes may collide after
& (n-1)
Types of Collisions
| Collision Type | Cause | Solution |
|---|---|---|
| hashCode collision | Two objects have the same hashCode() (rare with a good function) | Treeification at 8 elements (Java 8+) |
| Index collision | DIFFERENT hashCodes, but (n-1) & hash is the same (much more common) |
Resize (table doubling) |
Index collision example: hashCode = 100 and hashCode = 116 with n=16 both give index 4, because 100 & 15 = 4 and 116 & 15 = 4.
Resolution Strategies
- Separate Chaining — used in HashMap, LinkedHashMap, ConcurrentHashMap
- Open Addressing — used in ThreadLocalMap, IdentityHashMap
Evolution
| Version | Structure | Complexity |
|---|---|---|
| Java 7 | Linked list | O(n) |
| Java 8+ | List → Tree | O(log n) |
Common Mistakes
- Returning a constant from hashCode() — all elements in one bucket
- Using only one field in hashCode() — many collisions with different objects
🔴 Senior Level
Internal Mechanics
Collision is determined at the bucket level:
// hash(k1) & (n-1) == hash(k2) & (n-1)
// But k1.equals(k2) == false
HashMap uses two-level filtering:
- Fast filter — hash comparison (
p.hash == hash), filters 99.9% - Precise check —
equals()call for remaining candidates
Treeification Details
At 8 elements in a bucket:
- If capacity < 64:
resize()is called (spreads elements apart) - If capacity >= 64: list → red-black tree
Why 8? The probability of a chain of 8 elements with loadFactor 0.75 and a good hashCode: 0.00000006 (Poisson distribution). This is an anomaly marker — either a bad hashCode or an attack.
CPU and Cache Impact
Collisions affect performance more than it seems:
- Each
equals()is a method call + field comparison - Nodes are scattered across Heap → Cache Miss on every transition
- On treeification, tree balancing overhead is added
Hash Flooding Attack
Targeted DoS attack:
- Attacker generates keys with the same hash
- All elements land in one bucket
- Java 7: O(n²) on inserting n elements → 100% CPU
- Java 8+: O(n log n) — attack mitigated, but not fully eliminated
Memory Implications
Each collision = an additional Node object in Heap. With mass collisions:
- Young Gen GC pressure increases
- On treeification: TreeNode ~ 2x the size of Node
- Old Gen accumulates long-lived nodes
Production Monitoring
To detect problems:
- Monitor get/put times — growth indicates collisions
- Analyze bucket distribution via JFR (Java Flight Recorder)
- Check hashCode() quality via unit tests on distribution
🎯 Interview Cheat Sheet
Must know:
- Collision = two different keys landed in the same bucket — this is normal, not a bug
- Pigeonhole principle: infinitely many keys, limited buckets — collisions are inevitable
- Java 7: O(n) on collisions, Java 8+: O(log n) thanks to treeification
- Birthday paradox: with 5 elements and 16 buckets, collision probability > 50%
- Classic example: “Aa” and “BB” have the same hashCode = 2112
- Two collision types: same hashCode AND different hashCodes but same index after
(n-1) & hash - Threshold 8 was chosen based on Poisson distribution — probability ≈ 0.00000006
Common follow-up questions:
- Why are collisions inevitable? — hashCode = 32 bits (~4.3 billion), but infinitely many possible objects
- How does Separate Chaining differ from Open Addressing? — chaining = lists in buckets, open addressing = probing for next free slot
- What is a Hash Flooding Attack? — attacker crafts keys with the same hash → DoS
- Why is the probability of 8 collisions so small? — Poisson distribution at loadFactor 0.75
Red flags (DO NOT say):
- “Collision is a bug” — no, expected behavior
- “HashMap doesn’t support collisions” — it does via chains/trees
- “hashCode is always unique” — no, 32 bits can’t cover an infinite set
Related topics:
- [[03. How HashMap Determines Which Bucket to Put an Element In]]
- [[05. How HashMap Handles Collisions]]
- [[06. What Happens When 8 Elements Are Reached in One Bucket]]
- [[09. If Two Objects Have the Same hashCode(), Are They Necessarily Equal by equals()]]