Why String Is Often Used as a Key in HashMap?
String is the most popular key type in HashMap for several reasons:
🟢 Junior Level
String is the most popular key type in HashMap for several reasons:
- Immutable — once created, a String cannot be changed
- Meaningful — human-readable: “userId”, “email”, etc.
- Caches hashCode — computed once and stored
Why immutability matters:
String key = "user_42";
map.put(key, data);
// key CANNOT be modified — hash won't change
// Element will always be found
Example:
Map<String, Integer> ages = new HashMap<>();
ages.put("Alice", 30);
ages.put("Bob", 25);
// "Alice" — always the same object, same hashCode
System.out.println(ages.get("Alice")); // 30 — always works
🟡 Middle Level
Why String is the Ideal Key
| Property | Description |
|---|---|
| Immutable | Cannot be modified — hash doesn’t change |
| Caches hashCode | Computed on first call, stored in private hash field |
| Good distribution | hashCode formula based on characters: s[0]*31^(n-1) + … |
| Implements Comparable | Speeds up treeification (compareTo instead of tieBreakOrder) |
| Widely used | All developers expect it as a key |
hashCode Caching
// Inside String:
private int hash; // Default 0 — means "not computed"
public int hashCode() {
int h = hash;
if (h == 0 && value.length > 0) {
hash = h = hashCode(value); // Computed once
}
return h;
}
The first call computes the hash, subsequent calls return the cached value in O(1). This is critical for HashMap performance — on every get/put, hashCode is called multiple times.
String Pool
String literals are stored in the String Pool (interned):
String s1 = "key";
String s2 = "key";
// s1 == s2 — the same object! Reference comparison works
This saves memory: identical literals reference the same object.
Why 31 as Multiplier?
// String.hashCode() formula:
hash = s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
31 was chosen because:
- Odd prime — minimizes collisions
- 31 * i = (i « 5) - i — JVM optimizes multiplication into a shift and subtraction
- 31 gives good distribution for typical string data
Common Mistakes
- Using String concatenation as keys in a loop — creates many temporary objects
new String("key")— creates a new object, doesn’t use the String Pool- Very long strings as keys — hashCode computation is expensive (O(n))
🔴 Senior Level
Internal Implementation of String.hashCode()
public int hashCode() {
int h = hash;
if (h == 0 && value.length > 0) {
char val[] = value;
for (int i = 0; i < value.length; i++) {
h = 31 * h + val[i];
}
hash = h;
}
return h;
}
The hash is computed lazily — on the first hashCode() call. This is safe because String is immutable: the value can’t change between calls.
Compact Strings (Java 9+)
Starting with Java 9, String uses byte[] instead of char[] internally:
private final byte[] value;
private final byte coder; // LATIN1 or UTF16
For ASCII strings, this halves memory consumption. The hashCode formula remains the same.
Performance: hashCode Computation
| String Length | hashCode() Cost |
|---|---|
| 1-10 chars | ~5-15 ns |
| 100 chars | ~50-100 ns |
| 10000 chars | ~5-10 µs |
For long strings, hashCode caching is critical: without it, every HashMap operation would pay the full cost.
Edge Cases
- Empty String:
"".hashCode() = 0— lands in bucket 0, same as null key - Very long strings: hashCode = O(n), but cached — only the first call is expensive
- String.substring (pre-Java 7u6): shared the original char[] — potential memory leak. Fixed in Java 7u6 — copies the array.
String vs Enum as Key
| Criterion | String | Enum |
|---|---|---|
| hashCode cost | O(n) first time, then O(1) | O(1) (ordinal) |
| Memory | Object + char[]/byte[] | Singleton |
| Type safety | No | Yes |
| Flexibility | High | Fixed set |
For a fixed set of keys — Enum (or EnumMap) is better. For dynamic keys — String is the universal choice.
Security
- Hash Flooding via String collisions: known collisions exist (“Aa” and “BB”). In Java 8+, treeification mitigates this
- DoS via long strings: hashCode computation on 1MB strings can be used for CPU exhaustion
- String deduplication (G1 GC):
-XX:+UseStringDeduplicationreduces memory overhead for repeated keys
Best Practices
- Use short String keys — IDs, names, short identifiers
- Avoid concatenation in hot paths — pre-build keys
- Use
String.intern()carefully — saves memory, but pollutes the global pool - For fixed sets — prefer Enum or EnumMap
🎯 Interview Cheat Sheet
Must know:
- String is immutable — hash doesn’t change, no “broken key” risk
- String caches hashCode — computed once, returned in O(1) on subsequent calls
- String implements Comparable — speeds up treeification (compareTo instead of tieBreakOrder)
- Good hashCode distribution: formula using multiplier 31 (=(i«5)-i)
- String Pool: identical literals reference the same object — memory savings
- Short strings are optimal; long strings have expensive first hashCode computation
Common follow-up questions:
- Why 31 as a multiplier? — odd prime, JVM optimizes 31*i = (i«5)-i, good distribution
- Does String.hashCode() compute every time? — no, cached in a private field after the first call
- Why not use very long strings as keys? — hashCode computation is O(n), first call is expensive
- String vs Enum as key? — Enum is faster (O(1) ordinal), but String is more flexible
Red flags (DO NOT say):
- “String is mutable” — no, it’s immutable by design
- “hashCode is computed every time” — no, cached after the first call
- “String is the best key always” — for fixed sets, Enum/EnumMap is better
Related topics:
- [[03. How HashMap Determines Which Bucket to Put an Element In]]
- [[07. What is the equals() and hashCode() Contract]]
- [[12. Can You Use a Mutable Object as a Key in HashMap]]
- [[14. What Are the Requirements for a HashMap Key]]