Question 16 · Section 10

What is Load Factor in HashMap?

By default, load factor = 0.75 (75%).

Language versions: English Russian Ukrainian

🟢 Junior Level

Load Factor is a number that determines when HashMap should increase its size.

By default, load factor = 0.75 (75%).

How it works:

HashMap<String, Integer> map = new HashMap<>();
// capacity = 16, load factor = 0.75
// threshold = 16 * 0.75 = 12
// On adding the 13th element, HashMap grows to 32 buckets

Simple analogy: Imagine a glass filled with water to 75%. If you pour more — it overflows, and you need a bigger glass. Load factor is the “75%” mark on the glass.

Why 0.75? It’s a balance: not too many empty slots (memory) and not too many collisions (speed).

🟡 Middle Level

Resize Threshold Formula

threshold = capacity * loadFactor

When size > threshold, resize (array doubling) occurs.

Impact on Performance and Memory

Load Factor Memory Speed Collisions
0.5 (low) More (50% empty buckets) Faster (buckets contain 0-1 elements, search is almost always O(1) without list/tree traversal) Minimum collisions
0.75 (default) Balance Balance Few
0.9 (high) Less Slower More
1.0+ Minimum Even slower Many

When to Change loadFactor?

Low (0.5): When access speed is critical (low latency systems)

new HashMap<>(1000, 0.5f); // Fast access, but 2x memory

High (>0.75): When memory saving is critical (Embedded, IoT)

new HashMap<>(1000, 0.9f); // Saves memory, but more collisions

Interaction with Initial Capacity

// Correct calculation for 1000 elements:
int capacity = (int)(1000 / 0.75f) + 1; // = 1334
Map<K, V> map = new HashMap<>(capacity, 0.75f);
// HashMap rounds up to 2048, threshold = 1536

Common Mistakes

  1. Load factor > 1.0 — allowed, but search degrades to O(n)
  2. Too low — wasted memory, frequent resizes at low fill
  3. Not accounting for rounding — HashMap always rounds capacity to a power of two

🔴 Senior Level

Poisson Distribution

The choice of 0.75 is statistically justified. At this factor, element distribution across buckets:

P(x) = (0.75^x * e^-0.75) / x!

x=0: 47.2%
x=1: 35.4%
x=2: 13.3%
x=8: 0.000006% — practically impossible

Probability of a chain of 8 elements: 0.00000006. Treeification — anomaly protection.

Cache Locality Impact

High load factor affects real-world performance more than asymptotics suggest:

  • Long chains → nodes scattered across Heap → Cache Miss
  • CPU waits for data from RAM (100+ ns) instead of L1/L2 cache (1-5 ns)
  • O(log n) with poor cache locality (tree nodes scattered across heap) can be slower than O(1) with a good load factor. CPU waits for data from RAM (~100 ns) instead of L1 cache (~1 ns).

Memory Footprint

At capacity = 1024:

  • Load factor 0.5: 512 elements, ~512 empty buckets → ~4KB “waste”
  • Load factor 0.75: 768 elements, ~256 empty → ~2KB “waste”
  • Load factor 0.9: 922 elements, ~102 empty → ~1KB “waste”

Edge Cases

  1. Load factor = 1.0: Minimum empty buckets, but chains are guaranteed
  2. Load factor > 1.0: HashMap works as an array of lists, search is O(n)
  3. Load factor < 0.1: Extremely fast, but 10x memory overhead

Production Tuning

In high-load systems:

  • Read-heavy: load factor 0.5-0.6 — minimum collisions, maximum speed
  • Write-heavy: load factor 0.75 — balance, fewer resizes
  • Memory-constrained: load factor 0.9+ — memory saving, acceptable slowdown

Monitoring

To analyze effectiveness:

  • Measure average element count per bucket
  • Compare get/put latency at different load factors
  • Use JFR to analyze distribution

🎯 Interview Cheat Sheet

Must know:

  • Load factor = 0.75 by default — balance between memory and collisions
  • threshold = capacity * loadFactor; on exceeding — resize
  • Poisson distribution: at 0.75, probability of 8+ elements in a bucket ≈ 0.00000006
  • Low LF (0.5): faster, but 2x memory; high (0.9): saves memory, but more collisions
  • High load factor = long chains = Cache Miss = real performance worse than asymptotics
  • LF > 1.0 is allowed, but search degrades to O(n)

Common follow-up questions:

  • When to use LF = 0.5? — read-heavy, low latency systems
  • When LF = 0.9? — memory-constrained (Embedded, IoT)
  • Why 0.75, not 0.5 or 1.0? — statistically justified compromise (Poisson)
  • Does LF affect iteration? — indirectly: larger capacity at low LF = slower iteration

Red flags (DO NOT say):

  • “Load factor = 1.0 is optimal” — no, chains and collisions are guaranteed
  • “You can change load factor after creation” — no, it’s set in the constructor
  • “0.75 means 75% of buckets are filled” — no, 75% of capacity before resize

Related topics:

  • [[17. What is Capacity in HashMap]]
  • [[18. When Does Rehashing Occur in HashMap]]
  • [[28. How to Choose the Initial Capacity for HashMap]]