Question 1 · Section 10

How HashMap Works Internally?

The core idea: HashMap uses a hash table — an internal array of buckets (bins).

Language versions: English Russian Ukrainian

🟢 Junior Level

HashMap is a Java collection that stores data in “key-value” format (Map). It allows very fast lookup, addition, and removal of elements by key.

The core idea: HashMap uses a hash table — an internal array of buckets (bins).

Why it’s fast: Unlike ArrayList, which iterates through all elements (O(n)), HashMap computes a special number (hash) of the key and goes directly to the right cell — like going to a specific locker number instead of checking every locker. When you add an element, HashMap computes the key’s hash to determine which bucket to put it in. When searching, it recomputes the same hash and goes directly to the right bucket.

Example:

Map<String, Integer> ages = new HashMap<>();
ages.put("Alice", 30);  // hashCode("Alice") = 63154590 → index = 15 → table[15]
ages.put("Bob", 25);    // hashCode("Bob") = 66133 → index = 5 → table[5]
System.out.println(ages.get("Alice")); // 30 — same hash, same index 15

When to use:

  • When you need fast lookup by key
  • When element order doesn’t matter
  • When you need one of the fastest options for lookup (average O(1))

When NOT to Use HashMap

  1. Insertion order needed — use LinkedHashMap
  2. Sorted by key — TreeMap
  3. Multi-threaded access — ConcurrentHashMap
  4. Memory-critical with few elements — ArrayList with linear search may be cheaper

🟡 Middle Level

How It Works Internally

HashMap consists of several key fields:

transient Node<K,V>[] table; // Array of buckets (lazily initialized on first put)
int threshold;               // Resize threshold (capacity * loadFactor)
final float loadFactor;      // Load factor (default 0.75)
transient int size;          // Current number of elements

How put() works:

  1. On the first put(), an array of 16 buckets is created (lazy initialization, Java 8+). In Java 7, the array was created at construction.
  2. hash(key) is computed — a spread (perturbed) hash code of the key.

Spreading — high bits of the hash code are ‘mixed’ with low bits via XOR. XOR is a bitwise operation: if bits differ → 1, same → 0. This minimizes collisions even with a poor hashCode function: differences in high bits are ‘carried’ to low bits, which are used for indexing.

  1. Bucket index is determined: index = (n - 1) & hash
  2. If bucket is empty — a new Node is created
  3. If bucket is occupied — equals() is checked to find existing key
  4. If key is found — value is overwritten, if not — new node is added

How get() works:

  1. Compute key’s hash
  2. Determine bucket index
  3. Search for element in bucket via equals()

Common Mistakes

  1. Using mutable objects as keys — if key fields are modified after put(), the element becomes unreachable
  2. Overriding equals without hashCode — equal objects end up in different buckets
  3. Concurrent access — HashMap is not thread-safe

Comparison with Alternatives

Implementation Lookup Complexity Order Thread Safety
HashMap O(1) No No
TreeMap O(log n) Sorted No
LinkedHashMap O(1) Insertion order No
ConcurrentHashMap O(1) No Yes

🔴 Senior Level

Internal Implementation

Hash function (spread):

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

The XOR operation between high and low 16 bits minimizes collisions for small arrays, where the index is determined only by low bits via the (n-1) mask.

Index determination:

index = (n - 1) & hash;

& is used instead of % — it’s tens of times faster. It requires the array size to always be a power of two.

Architectural Trade-offs

Separate Chaining vs Open Addressing:

  • HashMap uses separate chaining — collisions are resolved with linked lists/trees in each bucket
  • ✅ Pros: resilient to poor hash functions, simpler deletion
  • ❌ Cons: allocates separate Node objects (memory overhead, poor cache locality)
  • Open addressing (as in ThreadLocalMap) saves memory but suffers from clustering

Treeification (Java 8+):

  • At 8 elements in a bucket and capacity >= 64, the linked list converts to a red-black tree.

Red-black tree — a balanced binary search tree where each node is ‘colored’ red or black. Coloring rules guarantee balance: search is always O(log n).

Hysteresis — gap between forward (8 → tree) and reverse (6 → list) conversion thresholds. This prevents ‘churn’ — when the structure constantly flips back and forth under fluctuating load.

  • Worst-case search complexity: O(n) -> O(log n)
  • TreeNode takes ~2x more memory than Node
  • Reverse conversion threshold: 6 elements (hysteresis to prevent churn)

Edge Cases

  1. Hash Flooding Attack — attacker crafts keys with the same hash; before Java 8 this caused DoS via O(n); Java 8+ protects via treeification
  2. Concurrent resize — in Java 7, parallel resize caused list cycling; Java 8+ fixed via tail insertion, but data loss is still possible
  3. Null key — always goes to table[0], since hash(null) = 0

Performance

  • Average case: O(1) for get/put/remove
  • Worst case: O(log n) (Java 8+), O(n) (Java 7)
  • Amortized put cost: O(1) (resize is O(n), but occurs rarely)
  • Memory footprint: each Node — object header + int hash + K key + V value + Node next ≈ 32-48 bytes per entry (excluding key/value)

Resize Optimization

When doubling the array (Java 8+), there’s no need to recompute hashes. One bit is checked:

if ((e.hash & oldCap) == 0) {
    // Stays at the same index
} else {
    // Moves to oldIndex + oldCapacity
}

Production Experience

In high-load systems, frequent resizes cause latency spikes. It’s recommended to set initialCapacity = (expectedSize / 0.75) + 1 to avoid O(n) operations at runtime.

Best Practices for Highload

  • Use ConcurrentHashMap for multi-threaded access
  • Pre-calculate capacity
  • Use immutable keys with quality hashCode
  • Avoid null keys for better portability

🎯 Interview Cheat Sheet

Must know:

  • HashMap = array of buckets + linked lists/trees (Java 8+)
  • put/get work in O(1) average case
  • Lazy initialization: array is created on first put
  • Spread function: XOR of high and low 16 bits for uniform distribution
  • Indexing via (n-1) & hash — faster than division
  • At 8 elements in bucket and capacity >= 64 — treeification (list → red-black tree)
  • null key is allowed (one), always in table[0]
  • Not thread-safe — use ConcurrentHashMap for concurrency

Common follow-up questions:

  • Why is capacity always a power of two? — to use fast bitwise mask instead of division
  • What is load factor 0.75? — balance between memory and collisions, justified by Poisson distribution
  • What happens with a bad hashCode? — all elements in one bucket, degradation to O(n) or O(log n)
  • Why not use HashMap in multi-threading? — data race, data loss, corruption during resize

Red flags (DO NOT say):

  • “HashMap uses division for indexing” — no, bitwise mask
  • “HashMap is thread-safe” — no, that’s ConcurrentHashMap
  • “At 8 elements it’s always a tree” — only if capacity >= 64

Related topics:

  • [[02. What is a Bucket in HashMap]]
  • [[04. What is a Collision in HashMap]]
  • [[05. How HashMap Handles Collisions]]
  • [[16. What is Load Factor in HashMap]]
  • [[20. What is the Time Complexity of get() and put() Operations in HashMap]]