How HashMap Determines Which Bucket to Put an Element In?
HashMap determines the bucket number in two steps:
🟢 Junior Level
HashMap determines the bucket number in two steps:
- Gets the key’s hash code — calls
key.hashCode(), which returns a 32-bit number — the object’s “digital fingerprint”. A good hashCode distributes objects evenly across the range; a bad one clusters them. - Mixes the bits — XORs the hash code with itself, shifted right by 16 positions
- Applies a mask — uses the formula
(array_size - 1) & hash
Simple example:
String key = "hello";
int hash = key.hashCode(); // e.g., 99162322
int index = (16 - 1) & hash; // With size 16: index = 2
Why it’s so fast: Instead of division (which is slow), bitwise AND (&) is used, which executes in 1 CPU cycle.
🟡 Middle Level
Step 1: Bit Perturbation Function
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
Why XOR and shift by 16?
- The array size is usually small (16, 32, 64…), so only low bits are used for the index
- If
hashCode()differs only in high bits — collisions occur >>> 16shift moves high bits to low positions- XOR mixes high-bit information into low bits
- Result: even with a poor hashCode, elements distribute evenly
Why specifically XOR and shift 16? The perturbation function (from Latin ‘perturbare’, ‘to mix’) ensures that even if hashCode() differs only in high bits, after mixing the difference also appears in low bits.
Step 2: Power-of-Two Magic
int index = (n - 1) & hash;
If n = 16 (2^4):
n-1 = 15 => 0000 1111 (mask)
hash => 1101 0110
& => 0000 0110 (result: 6)
This is mathematically equivalent to hash % 16. & executes in 1 cycle (bitwise operation), while % requires division — 10-50 cycles depending on the CPU.
hashCode Caching
String caches its hashCode in a private hash field. Frequent HashMap operations benefit enormously from this.
Common Mistakes
- Poor hashCode — always returning 1 puts all elements in one bucket
- Mutable key — modifying key fields after put changes the hash, making the element unreachable
Don’t Try to “Improve” Hashing
Don’t create exotic hashing formulas without profiling. The OpenJDK spread function is thoroughly tested. A bad “optimization” is often worse than the default.
🔴 Senior Level
Internal Implementation
In the OpenJDK source (HashMap.java):
// putVal method:
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
Index determination happens on every operation — get, put, remove. That’s why hashing optimization is critical.
Why Power of Two is Mandatory
If the size were 15 (not a power of two):
n-1 = 14 => 0000 1110 (mask)
The lowest bit is always zeroed — all odd buckets (1, 3, 5…) are never used. This causes 50% capacity loss and doubled collisions.
Performance
- Hash computation: O(1), 2 operations (XOR + shift)
- Index determination: O(1), 2 operations (& and subtraction)
- Overall overhead: minimal, but called billions of times in large applications
Edge Cases
- key == null:
hash(null)returns 0, null always intable[0] - hashCode() = constant: all elements in one bucket, degradation to O(n)/O(log n)
- Strings with same hashCode:
"Aa"and"BB"both give 2112 — a known Java collision
Security: Hash Flooding
An attacker can craft keys with the same hash. In Java 8+, treeification protects against DoS, but CPU overhead still increases due to tree rebalancing.
Optimization for Records
In Java 14+, record types generate hashCode() automatically based on all components. This guarantees a good contract, but for complex records with collection fields, hashCode can be expensive.
🎯 Interview Cheat Sheet
Must know:
- Step 1:
hash(key) = key.hashCode() ^ (key.hashCode() >>> 16)— spread function - Step 2:
index = (n-1) & hash— bitwise mask instead of% - XOR mixes high bits into low — protects against poor hashCodes
&executes in 1 cycle,%requires 10-50 cycles- String caches hashCode — repeated calls are O(1)
- null gives hash = 0, always in table[0]
- With non-2^n sizes, the formula breaks — odd buckets are never used
Common follow-up questions:
- Why specifically XOR and shift 16? — 32-bit int, half high bits, half low bits; XOR mixes high into low
- What if hashCode = constant? — all elements in one bucket, degradation to O(n)
- Why not use
%? — division is slow (10-50 cycles),&= 1 cycle - Do “Aa” and “BB” give the same hashCode? — yes, 2112 — a known collision
Red flags (DO NOT say):
- “HashMap uses hashCode directly” — no, there’s a spread function
- “Index = hashCode % size” — no,
(n-1) & hash - “You can improve the spread function” — the standard OpenJDK one is thoroughly tested
Related topics:
- [[01. How HashMap Works Internally]]
- [[04. What is a Collision in HashMap]]
- [[07. What is the equals() and hashCode() Contract]]
- [[15. Why String Is Often Used as a Key in HashMap]]