Question 15 · Section 10

Why String Is Often Used as a Key in HashMap?

String is the most popular key type in HashMap for several reasons:

Language versions: English Russian Ukrainian

🟢 Junior Level

String is the most popular key type in HashMap for several reasons:

  1. Immutable — once created, a String cannot be changed
  2. Meaningful — human-readable: “userId”, “email”, etc.
  3. Caches hashCode — computed once and stored

Why immutability matters:

String key = "user_42";
map.put(key, data);
// key CANNOT be modified — hash won't change
// Element will always be found

Example:

Map<String, Integer> ages = new HashMap<>();
ages.put("Alice", 30);
ages.put("Bob", 25);
// "Alice" — always the same object, same hashCode
System.out.println(ages.get("Alice")); // 30 — always works

🟡 Middle Level

Why String is the Ideal Key

Property Description
Immutable Cannot be modified — hash doesn’t change
Caches hashCode Computed on first call, stored in private hash field
Good distribution hashCode formula based on characters: s[0]*31^(n-1) + …
Implements Comparable Speeds up treeification (compareTo instead of tieBreakOrder)
Widely used All developers expect it as a key

hashCode Caching

// Inside String:
private int hash; // Default 0 — means "not computed"

public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {
        hash = h = hashCode(value); // Computed once
    }
    return h;
}

The first call computes the hash, subsequent calls return the cached value in O(1). This is critical for HashMap performance — on every get/put, hashCode is called multiple times.

String Pool

String literals are stored in the String Pool (interned):

String s1 = "key";
String s2 = "key";
// s1 == s2 — the same object! Reference comparison works

This saves memory: identical literals reference the same object.

Why 31 as Multiplier?

// String.hashCode() formula:
hash = s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

31 was chosen because:

  1. Odd prime — minimizes collisions
  2. 31 * i = (i « 5) - i — JVM optimizes multiplication into a shift and subtraction
  3. 31 gives good distribution for typical string data

Common Mistakes

  1. Using String concatenation as keys in a loop — creates many temporary objects
  2. new String("key") — creates a new object, doesn’t use the String Pool
  3. Very long strings as keys — hashCode computation is expensive (O(n))

🔴 Senior Level

Internal Implementation of String.hashCode()

public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {
        char val[] = value;
        for (int i = 0; i < value.length; i++) {
            h = 31 * h + val[i];
        }
        hash = h;
    }
    return h;
}

The hash is computed lazily — on the first hashCode() call. This is safe because String is immutable: the value can’t change between calls.

Compact Strings (Java 9+)

Starting with Java 9, String uses byte[] instead of char[] internally:

private final byte[] value;
private final byte coder; // LATIN1 or UTF16

For ASCII strings, this halves memory consumption. The hashCode formula remains the same.

Performance: hashCode Computation

String Length hashCode() Cost
1-10 chars ~5-15 ns
100 chars ~50-100 ns
10000 chars ~5-10 µs

For long strings, hashCode caching is critical: without it, every HashMap operation would pay the full cost.

Edge Cases

  1. Empty String: "".hashCode() = 0 — lands in bucket 0, same as null key
  2. Very long strings: hashCode = O(n), but cached — only the first call is expensive
  3. String.substring (pre-Java 7u6): shared the original char[] — potential memory leak. Fixed in Java 7u6 — copies the array.

String vs Enum as Key

Criterion String Enum
hashCode cost O(n) first time, then O(1) O(1) (ordinal)
Memory Object + char[]/byte[] Singleton
Type safety No Yes
Flexibility High Fixed set

For a fixed set of keys — Enum (or EnumMap) is better. For dynamic keys — String is the universal choice.

Security

  • Hash Flooding via String collisions: known collisions exist (“Aa” and “BB”). In Java 8+, treeification mitigates this
  • DoS via long strings: hashCode computation on 1MB strings can be used for CPU exhaustion
  • String deduplication (G1 GC): -XX:+UseStringDeduplication reduces memory overhead for repeated keys

Best Practices

  1. Use short String keys — IDs, names, short identifiers
  2. Avoid concatenation in hot paths — pre-build keys
  3. Use String.intern() carefully — saves memory, but pollutes the global pool
  4. For fixed sets — prefer Enum or EnumMap

🎯 Interview Cheat Sheet

Must know:

  • String is immutable — hash doesn’t change, no “broken key” risk
  • String caches hashCode — computed once, returned in O(1) on subsequent calls
  • String implements Comparable — speeds up treeification (compareTo instead of tieBreakOrder)
  • Good hashCode distribution: formula using multiplier 31 (=(i«5)-i)
  • String Pool: identical literals reference the same object — memory savings
  • Short strings are optimal; long strings have expensive first hashCode computation

Common follow-up questions:

  • Why 31 as a multiplier? — odd prime, JVM optimizes 31*i = (i«5)-i, good distribution
  • Does String.hashCode() compute every time? — no, cached in a private field after the first call
  • Why not use very long strings as keys? — hashCode computation is O(n), first call is expensive
  • String vs Enum as key? — Enum is faster (O(1) ordinal), but String is more flexible

Red flags (DO NOT say):

  • “String is mutable” — no, it’s immutable by design
  • “hashCode is computed every time” — no, cached after the first call
  • “String is the best key always” — for fixed sets, Enum/EnumMap is better

Related topics:

  • [[03. How HashMap Determines Which Bucket to Put an Element In]]
  • [[07. What is the equals() and hashCode() Contract]]
  • [[12. Can You Use a Mutable Object as a Key in HashMap]]
  • [[14. What Are the Requirements for a HashMap Key]]