Why String Is Often Used as a Key in HashMap?

🟢 Junior Level

String is the most popular key type in HashMap for several reasons:

Immutable — once created, a String cannot be changed
Meaningful — human-readable: “userId”, “email”, etc.
Caches hashCode — computed once and stored

Why immutability matters:

String key = "user_42";
map.put(key, data);
// key CANNOT be modified — hash won't change
// Element will always be found

Example:

Map<String, Integer> ages = new HashMap<>();
ages.put("Alice", 30);
ages.put("Bob", 25);
// "Alice" — always the same object, same hashCode
System.out.println(ages.get("Alice")); // 30 — always works

🟡 Middle Level

Why String is the Ideal Key

Property	Description
Immutable	Cannot be modified — hash doesn’t change
Caches hashCode	Computed on first call, stored in private `hash` field
Good distribution	hashCode formula based on characters: s[0]*31^(n-1) + …
Implements Comparable	Speeds up treeification (compareTo instead of tieBreakOrder)
Widely used	All developers expect it as a key

hashCode Caching

// Inside String:
private int hash; // Default 0 — means "not computed"

public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {
        hash = h = hashCode(value); // Computed once
    }
    return h;
}

The first call computes the hash, subsequent calls return the cached value in O(1). This is critical for HashMap performance — on every get/put, hashCode is called multiple times.

String Pool

String literals are stored in the String Pool (interned):

String s1 = "key";
String s2 = "key";
// s1 == s2 — the same object! Reference comparison works

This saves memory: identical literals reference the same object.

Why 31 as Multiplier?

// String.hashCode() formula:
hash = s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

31 was chosen because:

Odd prime — minimizes collisions
31 * i = (i « 5) - i — JVM optimizes multiplication into a shift and subtraction
31 gives good distribution for typical string data

Common Mistakes

Using String concatenation as keys in a loop — creates many temporary objects
new String("key") — creates a new object, doesn’t use the String Pool
Very long strings as keys — hashCode computation is expensive (O(n))

🔴 Senior Level

Internal Implementation of String.hashCode()

public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {
        char val[] = value;
        for (int i = 0; i < value.length; i++) {
            h = 31 * h + val[i];
        }
        hash = h;
    }
    return h;
}

The hash is computed lazily — on the first hashCode() call. This is safe because String is immutable: the value can’t change between calls.

Compact Strings (Java 9+)

Starting with Java 9, String uses byte[] instead of char[] internally:

private final byte[] value;
private final byte coder; // LATIN1 or UTF16

For ASCII strings, this halves memory consumption. The hashCode formula remains the same.

Performance: hashCode Computation

String Length	hashCode() Cost
1-10 chars	~5-15 ns
100 chars	~50-100 ns
10000 chars	~5-10 µs

For long strings, hashCode caching is critical: without it, every HashMap operation would pay the full cost.

Edge Cases

Empty String: "".hashCode() = 0 — lands in bucket 0, same as null key
Very long strings: hashCode = O(n), but cached — only the first call is expensive
String.substring (pre-Java 7u6): shared the original char[] — potential memory leak. Fixed in Java 7u6 — copies the array.

String vs Enum as Key

Criterion	String	Enum
hashCode cost	O(n) first time, then O(1)	O(1) (ordinal)
Memory	Object + char[]/byte[]	Singleton
Type safety	No	Yes
Flexibility	High	Fixed set

For a fixed set of keys — Enum (or EnumMap) is better. For dynamic keys — String is the universal choice.

Security

Hash Flooding via String collisions: known collisions exist (“Aa” and “BB”). In Java 8+, treeification mitigates this
DoS via long strings: hashCode computation on 1MB strings can be used for CPU exhaustion
String deduplication (G1 GC): -XX:+UseStringDeduplication reduces memory overhead for repeated keys

Best Practices

Use short String keys — IDs, names, short identifiers
Avoid concatenation in hot paths — pre-build keys
Use String.intern() carefully — saves memory, but pollutes the global pool
For fixed sets — prefer Enum or EnumMap

🎯 Interview Cheat Sheet

Must know:

String is immutable — hash doesn’t change, no “broken key” risk
String caches hashCode — computed once, returned in O(1) on subsequent calls
String implements Comparable — speeds up treeification (compareTo instead of tieBreakOrder)
Good hashCode distribution: formula using multiplier 31 (=(i«5)-i)
String Pool: identical literals reference the same object — memory savings
Short strings are optimal; long strings have expensive first hashCode computation

Common follow-up questions:

Why 31 as a multiplier? — odd prime, JVM optimizes 31*i = (i«5)-i, good distribution
Does String.hashCode() compute every time? — no, cached in a private field after the first call
Why not use very long strings as keys? — hashCode computation is O(n), first call is expensive
String vs Enum as key? — Enum is faster (O(1) ordinal), but String is more flexible

Red flags (DO NOT say):

“String is mutable” — no, it’s immutable by design
“hashCode is computed every time” — no, cached after the first call
“String is the best key always” — for fixed sets, Enum/EnumMap is better

Related topics:

[[03. How HashMap Determines Which Bucket to Put an Element In]]
[[07. What is the equals() and hashCode() Contract]]
[[12. Can You Use a Mutable Object as a Key in HashMap]]
[[14. What Are the Requirements for a HashMap Key]]