How HashMap Works Internally?

🟢 Junior Level

HashMap is a Java collection that stores data in “key-value” format (Map). It allows very fast lookup, addition, and removal of elements by key.

The core idea: HashMap uses a hash table — an internal array of buckets (bins).

Why it’s fast: Unlike ArrayList, which iterates through all elements (O(n)), HashMap computes a special number (hash) of the key and goes directly to the right cell — like going to a specific locker number instead of checking every locker. When you add an element, HashMap computes the key’s hash to determine which bucket to put it in. When searching, it recomputes the same hash and goes directly to the right bucket.

Example:

Map<String, Integer> ages = new HashMap<>();
ages.put("Alice", 30);  // hashCode("Alice") = 63154590 → index = 15 → table[15]
ages.put("Bob", 25);    // hashCode("Bob") = 66133 → index = 5 → table[5]
System.out.println(ages.get("Alice")); // 30 — same hash, same index 15

When to use:

When you need fast lookup by key
When element order doesn’t matter
When you need one of the fastest options for lookup (average O(1))

When NOT to Use HashMap

Insertion order needed — use LinkedHashMap
Sorted by key — TreeMap
Multi-threaded access — ConcurrentHashMap
Memory-critical with few elements — ArrayList with linear search may be cheaper

🟡 Middle Level

How It Works Internally

HashMap consists of several key fields:

transient Node<K,V>[] table; // Array of buckets (lazily initialized on first put)
int threshold;               // Resize threshold (capacity * loadFactor)
final float loadFactor;      // Load factor (default 0.75)
transient int size;          // Current number of elements

How put() works:

On the first put(), an array of 16 buckets is created (lazy initialization, Java 8+). In Java 7, the array was created at construction.
hash(key) is computed — a spread (perturbed) hash code of the key.

Spreading — high bits of the hash code are ‘mixed’ with low bits via XOR. XOR is a bitwise operation: if bits differ → 1, same → 0. This minimizes collisions even with a poor hashCode function: differences in high bits are ‘carried’ to low bits, which are used for indexing.

Bucket index is determined: index = (n - 1) & hash
If bucket is empty — a new Node is created
If bucket is occupied — equals() is checked to find existing key
If key is found — value is overwritten, if not — new node is added

How get() works:

Compute key’s hash
Determine bucket index
Search for element in bucket via equals()

Common Mistakes

Using mutable objects as keys — if key fields are modified after put(), the element becomes unreachable
Overriding equals without hashCode — equal objects end up in different buckets
Concurrent access — HashMap is not thread-safe

Comparison with Alternatives

Implementation	Lookup Complexity	Order	Thread Safety
HashMap	O(1)	No	No
TreeMap	O(log n)	Sorted	No
LinkedHashMap	O(1)	Insertion order	No
ConcurrentHashMap	O(1)	No	Yes

🔴 Senior Level

Internal Implementation

Hash function (spread):

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

The XOR operation between high and low 16 bits minimizes collisions for small arrays, where the index is determined only by low bits via the (n-1) mask.

Index determination:

index = (n - 1) & hash;

& is used instead of % — it’s tens of times faster. It requires the array size to always be a power of two.

Architectural Trade-offs

Separate Chaining vs Open Addressing:

HashMap uses separate chaining — collisions are resolved with linked lists/trees in each bucket
✅ Pros: resilient to poor hash functions, simpler deletion
❌ Cons: allocates separate Node objects (memory overhead, poor cache locality)
Open addressing (as in ThreadLocalMap) saves memory but suffers from clustering

Treeification (Java 8+):

At 8 elements in a bucket and capacity >= 64, the linked list converts to a red-black tree.

Red-black tree — a balanced binary search tree where each node is ‘colored’ red or black. Coloring rules guarantee balance: search is always O(log n).

Hysteresis — gap between forward (8 → tree) and reverse (6 → list) conversion thresholds. This prevents ‘churn’ — when the structure constantly flips back and forth under fluctuating load.

Worst-case search complexity: O(n) -> O(log n)
TreeNode takes ~2x more memory than Node
Reverse conversion threshold: 6 elements (hysteresis to prevent churn)

Edge Cases

Hash Flooding Attack — attacker crafts keys with the same hash; before Java 8 this caused DoS via O(n); Java 8+ protects via treeification
Concurrent resize — in Java 7, parallel resize caused list cycling; Java 8+ fixed via tail insertion, but data loss is still possible
Null key — always goes to table[0], since hash(null) = 0

Performance

Average case: O(1) for get/put/remove
Worst case: O(log n) (Java 8+), O(n) (Java 7)
Amortized put cost: O(1) (resize is O(n), but occurs rarely)
Memory footprint: each Node — object header + int hash + K key + V value + Node next ≈ 32-48 bytes per entry (excluding key/value)

Resize Optimization

When doubling the array (Java 8+), there’s no need to recompute hashes. One bit is checked:

if ((e.hash & oldCap) == 0) {
    // Stays at the same index
} else {
    // Moves to oldIndex + oldCapacity
}

Production Experience

In high-load systems, frequent resizes cause latency spikes. It’s recommended to set initialCapacity = (expectedSize / 0.75) + 1 to avoid O(n) operations at runtime.

Best Practices for Highload

Use ConcurrentHashMap for multi-threaded access
Pre-calculate capacity
Use immutable keys with quality hashCode
Avoid null keys for better portability

🎯 Interview Cheat Sheet

Must know:

HashMap = array of buckets + linked lists/trees (Java 8+)
put/get work in O(1) average case
Lazy initialization: array is created on first put
Spread function: XOR of high and low 16 bits for uniform distribution
Indexing via (n-1) & hash — faster than division
At 8 elements in bucket and capacity >= 64 — treeification (list → red-black tree)
null key is allowed (one), always in table[0]
Not thread-safe — use ConcurrentHashMap for concurrency

Common follow-up questions:

Why is capacity always a power of two? — to use fast bitwise mask instead of division
What is load factor 0.75? — balance between memory and collisions, justified by Poisson distribution
What happens with a bad hashCode? — all elements in one bucket, degradation to O(n) or O(log n)
Why not use HashMap in multi-threading? — data race, data loss, corruption during resize

Red flags (DO NOT say):

“HashMap uses division for indexing” — no, bitwise mask
“HashMap is thread-safe” — no, that’s ConcurrentHashMap
“At 8 elements it’s always a tree” — only if capacity >= 64

Related topics:

[[02. What is a Bucket in HashMap]]
[[04. What is a Collision in HashMap]]
[[05. How HashMap Handles Collisions]]
[[16. What is Load Factor in HashMap]]
[[20. What is the Time Complexity of get() and put() Operations in HashMap]]