Question 28 · Section 10

How to Choose the Initial Capacity for HashMap?

If not specified — it defaults to 16. But if you know how many elements will be in the map, it's better to set the capacity upfront.

Language versions: English Russian Ukrainian

🟢 Junior Level

Initial capacity is the size of HashMap’s internal array, set at creation.

If not specified — it defaults to 16. But if you know how many elements will be in the map, it’s better to set the capacity upfront.

Why? So HashMap doesn’t waste time on resizing, which is slow.

Simple formula:

int capacity = (expected_count / 0.75) + 1;
Map<String, Integer> map = new HashMap<>(capacity);

Example:

// Expecting 100 elements:
int capacity = (int) (100 / 0.75) + 1; // = 134
// HashMap rounds to nearest power of two upward: 256
Map<String, Integer> map = new HashMap<>(134);
// threshold = 256 * 0.75 = 192 — 100 elements fit without resize

🟡 Middle Level

The Resize Problem

Each resize is O(n): creating a new array + copying all elements.

// Without specifying capacity:
new HashMap<>();       // capacity = 16, threshold = 12
// At 13 elements: resize to 32
// At 25 elements: resize to 64
// At 49 elements: resize to 128
// ... and so on

Each resize — a delay and GC load.

The Ideal Capacity Formula

initialCapacity = (expectedSize / loadFactor) + 1

Why +1? The threshold is checked as ++size > threshold. Without +1, the last element triggers a resize.

Rounding to Power of Two

HashMap always rounds capacity up to a power of two:

Passed Became
10 16
100 128
1000 1024
1335 2048

Practical Example for 1000 Elements

// Wrong:
new HashMap<>(1000);
// → capacity = 1024, threshold = 768
// → At 769 elements, resize to 2048!

// Right:
new HashMap<>((int)(1000 / 0.75f) + 1); // = 1334
// → capacity = 2048, threshold = 1536
// → 1000 elements fit without resize

When You Don’t Need to Specify Capacity

  • Small collections (2-5 elements) — calculation overhead exceeds benefit
  • Unknown element count — default constructor

Guava Helpers

// Google Guava — already contains the correct formula:
Map<K, V> map = Maps.newHashMapWithExpectedSize(1000);

Common Mistakes

  1. Passing the expected count directlynew HashMap(1000) gives threshold 768
  2. Forgetting +1 — the last element triggers resize
  3. Excessive capacity — slow iteration over empty buckets

🔴 Senior Level

tableSizeFor Internal

static final int tableSizeFor(int cap) {
    int n = cap - 1;
    n |= n >>> 1;
    n |= n >>> 2;
    n |= n >>> 4;
    n |= n >>> 8;
    n |= n >>> 16;
    return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

This bit algorithm finds the nearest power of two in O(1).

Resize Cost Analysis

On resize capacity=N:

  • Allocation: N references × 8 bytes
  • Copying: size elements
  • GC: old array → garbage
  • Latency spike: 1-100ms depending on size

In latency-sensitive systems (HFT, real-time), resize is unacceptable.

Iteration Performance

// capacity = 1,000,000, size = 10
for (var entry : map.entrySet()) { ... }
// Iterator traverses 1,000,000 buckets, of which 10 are filled
// 999,990 empty checks — huge overhead

Don’t overestimate capacity without need.

Memory Trade-off

Capacity Memory (array) Resizes Iteration
Exact size Minimum Frequent Fast
With margin (formula) Slightly more None Fast
Greatly overestimated Much more None Slow

Production: Dynamic Sizing

In some frameworks (Netty, Kafka), custom maps are used with:

  • Predictable latency (no resize)
  • Ring buffer for eviction
  • Open addressing for memory savings

Benchmark: Impact of Wrong Capacity

Scenario Time to insert 1M elements
No capacity (resize) 120ms
Correct capacity 45ms
10x overestimated 50ms (insert) / 500ms (iteration)

Best Practice Formula

/**
 * Creates a HashMap without resize for expectedSize elements.
 */
public static <K, V> HashMap<K, V> newHashMap(int expectedSize) {
    return new HashMap<>((int) Math.ceil(expectedSize / 0.75) + 1);
}

🎯 Interview Cheat Sheet

Must know:

  • Formula: initialCapacity = (expectedSize / loadFactor) + 1 — to avoid resize
  • HashMap rounds to nearest power of two via tableSizeFor()
  • Without formula: new HashMap(1000) → capacity=1024, threshold=768 → resize at 769 elements
  • With formula: (1000/0.75)+1=1334 → capacity=2048, threshold=1536 → no resize
  • Each resize = O(n): allocation + copying + GC pressure + latency spike
  • Overestimated capacity = slow iteration over empty buckets (O(capacity + size))
  • Don’t specify capacity for small collections (2-5 elements)

Common follow-up questions:

  • Why +1 in the formula? — threshold is checked as ++size > threshold; without +1, the last element triggers resize
  • What is tableSizeFor? — bit algorithm: finds nearest power of two in O(1)
  • When to use Guava Maps.newHashMapWithExpectedSize? — already contains the correct formula
  • What does resize cost? — 1-100ms depending on size; for 1M elements ~150ns (P99 ~5000ns)

Red flags (DO NOT say):

  • “new HashMap(1000) creates 1000 buckets” — no, 1024, and threshold = 768
  • “The bigger the capacity the better” — no, iteration is slower, memory is wasted
  • “Capacity can be changed after creation” — no, set in constructor

Related topics:

  • [[17. What is Capacity in HashMap]]
  • [[18. When Does Rehashing Occur in HashMap]]
  • [[19. What Happens During Rehashing]]
  • [[16. What is Load Factor in HashMap]]