What is Load Factor in HashMap?
By default, load factor = 0.75 (75%).
🟢 Junior Level
Load Factor is a number that determines when HashMap should increase its size.
By default, load factor = 0.75 (75%).
How it works:
HashMap<String, Integer> map = new HashMap<>();
// capacity = 16, load factor = 0.75
// threshold = 16 * 0.75 = 12
// On adding the 13th element, HashMap grows to 32 buckets
Simple analogy: Imagine a glass filled with water to 75%. If you pour more — it overflows, and you need a bigger glass. Load factor is the “75%” mark on the glass.
Why 0.75? It’s a balance: not too many empty slots (memory) and not too many collisions (speed).
🟡 Middle Level
Resize Threshold Formula
threshold = capacity * loadFactor
When size > threshold, resize (array doubling) occurs.
Impact on Performance and Memory
| Load Factor | Memory | Speed | Collisions |
|---|---|---|---|
| 0.5 (low) | More (50% empty buckets) | Faster (buckets contain 0-1 elements, search is almost always O(1) without list/tree traversal) | Minimum collisions |
| 0.75 (default) | Balance | Balance | Few |
| 0.9 (high) | Less | Slower | More |
| 1.0+ | Minimum | Even slower | Many |
When to Change loadFactor?
Low (0.5): When access speed is critical (low latency systems)
new HashMap<>(1000, 0.5f); // Fast access, but 2x memory
High (>0.75): When memory saving is critical (Embedded, IoT)
new HashMap<>(1000, 0.9f); // Saves memory, but more collisions
Interaction with Initial Capacity
// Correct calculation for 1000 elements:
int capacity = (int)(1000 / 0.75f) + 1; // = 1334
Map<K, V> map = new HashMap<>(capacity, 0.75f);
// HashMap rounds up to 2048, threshold = 1536
Common Mistakes
- Load factor > 1.0 — allowed, but search degrades to O(n)
- Too low — wasted memory, frequent resizes at low fill
- Not accounting for rounding — HashMap always rounds capacity to a power of two
🔴 Senior Level
Poisson Distribution
The choice of 0.75 is statistically justified. At this factor, element distribution across buckets:
P(x) = (0.75^x * e^-0.75) / x!
x=0: 47.2%
x=1: 35.4%
x=2: 13.3%
x=8: 0.000006% — practically impossible
Probability of a chain of 8 elements: 0.00000006. Treeification — anomaly protection.
Cache Locality Impact
High load factor affects real-world performance more than asymptotics suggest:
- Long chains → nodes scattered across Heap → Cache Miss
- CPU waits for data from RAM (100+ ns) instead of L1/L2 cache (1-5 ns)
- O(log n) with poor cache locality (tree nodes scattered across heap) can be slower than O(1) with a good load factor. CPU waits for data from RAM (~100 ns) instead of L1 cache (~1 ns).
Memory Footprint
At capacity = 1024:
- Load factor 0.5: 512 elements, ~512 empty buckets → ~4KB “waste”
- Load factor 0.75: 768 elements, ~256 empty → ~2KB “waste”
- Load factor 0.9: 922 elements, ~102 empty → ~1KB “waste”
Edge Cases
- Load factor = 1.0: Minimum empty buckets, but chains are guaranteed
- Load factor > 1.0: HashMap works as an array of lists, search is O(n)
- Load factor < 0.1: Extremely fast, but 10x memory overhead
Production Tuning
In high-load systems:
- Read-heavy: load factor 0.5-0.6 — minimum collisions, maximum speed
- Write-heavy: load factor 0.75 — balance, fewer resizes
- Memory-constrained: load factor 0.9+ — memory saving, acceptable slowdown
Monitoring
To analyze effectiveness:
- Measure average element count per bucket
- Compare get/put latency at different load factors
- Use JFR to analyze distribution
🎯 Interview Cheat Sheet
Must know:
- Load factor = 0.75 by default — balance between memory and collisions
- threshold = capacity * loadFactor; on exceeding — resize
- Poisson distribution: at 0.75, probability of 8+ elements in a bucket ≈ 0.00000006
- Low LF (0.5): faster, but 2x memory; high (0.9): saves memory, but more collisions
- High load factor = long chains = Cache Miss = real performance worse than asymptotics
- LF > 1.0 is allowed, but search degrades to O(n)
Common follow-up questions:
- When to use LF = 0.5? — read-heavy, low latency systems
- When LF = 0.9? — memory-constrained (Embedded, IoT)
- Why 0.75, not 0.5 or 1.0? — statistically justified compromise (Poisson)
- Does LF affect iteration? — indirectly: larger capacity at low LF = slower iteration
Red flags (DO NOT say):
- “Load factor = 1.0 is optimal” — no, chains and collisions are guaranteed
- “You can change load factor after creation” — no, it’s set in the constructor
- “0.75 means 75% of buckets are filled” — no, 75% of capacity before resize
Related topics:
- [[17. What is Capacity in HashMap]]
- [[18. When Does Rehashing Occur in HashMap]]
- [[28. How to Choose the Initial Capacity for HashMap]]