How to Choose the Initial Capacity for HashMap?

🟢 Junior Level

Initial capacity is the size of HashMap’s internal array, set at creation.

If not specified — it defaults to 16. But if you know how many elements will be in the map, it’s better to set the capacity upfront.

Why? So HashMap doesn’t waste time on resizing, which is slow.

Simple formula:

int capacity = (expected_count / 0.75) + 1;
Map<String, Integer> map = new HashMap<>(capacity);

Example:

// Expecting 100 elements:
int capacity = (int) (100 / 0.75) + 1; // = 134
// HashMap rounds to nearest power of two upward: 256
Map<String, Integer> map = new HashMap<>(134);
// threshold = 256 * 0.75 = 192 — 100 elements fit without resize

🟡 Middle Level

The Resize Problem

Each resize is O(n): creating a new array + copying all elements.

// Without specifying capacity:
new HashMap<>();       // capacity = 16, threshold = 12
// At 13 elements: resize to 32
// At 25 elements: resize to 64
// At 49 elements: resize to 128
// ... and so on

Each resize — a delay and GC load.

The Ideal Capacity Formula

initialCapacity = (expectedSize / loadFactor) + 1

Why +1? The threshold is checked as ++size > threshold. Without +1, the last element triggers a resize.

Rounding to Power of Two

HashMap always rounds capacity up to a power of two:

Passed	Became
10	16
100	128
1000	1024
1335	2048

Practical Example for 1000 Elements

// Wrong:
new HashMap<>(1000);
// → capacity = 1024, threshold = 768
// → At 769 elements, resize to 2048!

// Right:
new HashMap<>((int)(1000 / 0.75f) + 1); // = 1334
// → capacity = 2048, threshold = 1536
// → 1000 elements fit without resize

When You Don’t Need to Specify Capacity

Small collections (2-5 elements) — calculation overhead exceeds benefit
Unknown element count — default constructor

Guava Helpers

// Google Guava — already contains the correct formula:
Map<K, V> map = Maps.newHashMapWithExpectedSize(1000);

Common Mistakes

Passing the expected count directly — new HashMap(1000) gives threshold 768
Forgetting +1 — the last element triggers resize
Excessive capacity — slow iteration over empty buckets

🔴 Senior Level

tableSizeFor Internal

static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

This bit algorithm finds the nearest power of two in O(1).

Resize Cost Analysis

On resize capacity=N:

Allocation: N references × 8 bytes
Copying: size elements
GC: old array → garbage
Latency spike: 1-100ms depending on size

In latency-sensitive systems (HFT, real-time), resize is unacceptable.

Iteration Performance

// capacity = 1,000,000, size = 10
for (var entry : map.entrySet()) { ... }
// Iterator traverses 1,000,000 buckets, of which 10 are filled
// 999,990 empty checks — huge overhead

Don’t overestimate capacity without need.

Memory Trade-off

Capacity	Memory (array)	Resizes	Iteration
Exact size	Minimum	Frequent	Fast
With margin (formula)	Slightly more	None	Fast
Greatly overestimated	Much more	None	Slow

Production: Dynamic Sizing

In some frameworks (Netty, Kafka), custom maps are used with:

Predictable latency (no resize)
Ring buffer for eviction
Open addressing for memory savings

Benchmark: Impact of Wrong Capacity

Scenario	Time to insert 1M elements
No capacity (resize)	120ms
Correct capacity	45ms
10x overestimated	50ms (insert) / 500ms (iteration)

Best Practice Formula

/**
 * Creates a HashMap without resize for expectedSize elements.
 */
public static <K, V> HashMap<K, V> newHashMap(int expectedSize) {
    return new HashMap<>((int) Math.ceil(expectedSize / 0.75) + 1);
}

🎯 Interview Cheat Sheet

Must know:

Formula: initialCapacity = (expectedSize / loadFactor) + 1 — to avoid resize
HashMap rounds to nearest power of two via tableSizeFor()
Without formula: new HashMap(1000) → capacity=1024, threshold=768 → resize at 769 elements
With formula: (1000/0.75)+1=1334 → capacity=2048, threshold=1536 → no resize
Each resize = O(n): allocation + copying + GC pressure + latency spike
Overestimated capacity = slow iteration over empty buckets (O(capacity + size))
Don’t specify capacity for small collections (2-5 elements)

Common follow-up questions:

Why +1 in the formula? — threshold is checked as ++size > threshold; without +1, the last element triggers resize
What is tableSizeFor? — bit algorithm: finds nearest power of two in O(1)
When to use Guava Maps.newHashMapWithExpectedSize? — already contains the correct formula
What does resize cost? — 1-100ms depending on size; for 1M elements ~150ns (P99 ~5000ns)

Red flags (DO NOT say):

“new HashMap(1000) creates 1000 buckets” — no, 1024, and threshold = 768
“The bigger the capacity the better” — no, iteration is slower, memory is wasted
“Capacity can be changed after creation” — no, set in constructor

Related topics:

[[17. What is Capacity in HashMap]]
[[18. When Does Rehashing Occur in HashMap]]
[[19. What Happens During Rehashing]]
[[16. What is Load Factor in HashMap]]