Question 12 Β· Section 12

Can String Pool Cause OutOfMemoryError?

Each unique string in the pool is an object (~48 bytes) + a hash table entry (~32 bytes). 100 million unique strings = ~8GB for the pool alone.

Language versions: English Russian Ukrainian

🟒 Junior Level

Yes, it can. But the error type depends on the Java version.

Each unique string in the pool is an object (~48 bytes) + a hash table entry (~32 bytes). 100 million unique strings = ~8GB for the pool alone.

Java 6 and earlier:

java.lang.OutOfMemoryError: PermGen space

String pool was in the PermGen area with fixed size. Lots of intern() β†’ memory ran out β†’ crash.

Java 7+:

java.lang.OutOfMemoryError: Java heap space

Pool moved to main heap. If you intern millions of unique strings β€” they fill up the entire heap.

Example:

// Dangerous code β€” can cause OOM
List<String> list = new ArrayList<>();
for (int i = 0; i < 100_000_000; i++) {
    list.add(String.valueOf(i).intern()); // Each string is unique!
}

How to avoid: Don’t use intern() for unique strings (UUIDs, hashes, IDs). Use only for strings with duplicates.


🟑 Middle Level

When OOM occurs

Scenario 1: Mass intern() of unique strings

// Each string is unique β€” pool grows uncontrollably
for (User user : users) {
    String email = user.getEmail().intern(); // UUIDs/emails β€” all different
}

Scenario 2: Leak through strong references

// Strings in pool + references in collection = never collected by GC
Set<String> cache = new HashSet<>();
while (true) {
    String data = readFromNetwork().intern();
    cache.add(data); // Grows infinitely
}

How to prevent

  1. Monitoring: jcmd <pid> VM.stringtable -verbose
  2. Tuning: -XX:StringTableSize=1000003
  3. Alternative: -XX:+UseStringDeduplication (G1 GC)
  4. Custom cache: ConcurrentHashMap<String, String> with eviction

Typical mistakes

  1. Mistake: Thinking GC will automatically clean the pool Solution: StringTable is a native JVM hash table that stores strong references to String objects. As long as the entry is in the table β€” the object is not eligible for GC.

  2. Mistake: intern() for every string from DB Solution: Only for fields with high duplication ratio


πŸ”΄ Senior Level

Internal Implementation

StringTable β€” native hash table:

oop StringTable::intern(Symbol* string, TRAPS) {
  unsigned int hashValue = hash_string(string);
  int index = the_table()->hash_to_index(hashValue);
  oop found_string = the_table()->lookup(index, string, hashValue);

  // Found
  if (found_string != NULL) return found_string;

  // Not found β€” create new entry in StringTable
  Handle string_object = java_lang_String::create_from_symbol(string, CHECK_NULL);
  the_table()->basic_add(index, string_object, string, hashValue, CHECK_NULL);
  return string_object();
}

Each entry in StringTable is a strong reference. GC won’t remove the String while the entry is in the table.

Two types of OOM

Type 1: StringTable overflow (hash collisions)

  • When StringTableSize < number of strings β†’ long collision chains
  • intern() degrades to O(n)
  • Application β€œhangs” β€” CPU 100% on table lookup
  • May manifest as GC Overhead Limit Exceeded

Type 2: Heap exhaustion

  • Millions of unique interned strings fill up Heap
  • OOM: Java heap space
  • Happens when pool competes for memory with business objects

Architectural Trade-offs

String Pool and GC:

  • Young Gen: new intern() β†’ Eden β†’ die quickly (if no references)
  • Old Gen: long-lived intern() β†’ Old Gen β†’ Full GC scans them all
  • G1 GC: StringTable scan adds to evacuation pause

Contention:

  • StringTable β€” global structure with locking
  • With parallel intern() from hundreds of threads β†’ contention
  • Can become a bottleneck in highload systems

Edge Cases

  1. Default StringTableSize (60013): If you intern 1M+ unique strings, average chain length = 1M / 60013 β‰ˆ 16. Worst case β€” MUCH more.

  2. GC and String Pool cleaning: Starting from Java 7u40, JVM removes unreachable entries from StringTable during Full GC. But this only works if there are no strong references to the strings.

  3. ZGC/Shenandoah: These GCs use concurrent marking. StringTable scan happens concurrently, but overhead still exists.

Performance

| Metric | Value | | ——————————– | β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”β€”- | | StringTableSize default | 60013 | | Max safe entries (default size) | ~100K | | intern() without collisions | ~50-100ns | | intern() with collisions (1M) | 10-50ΞΌs | | Memory per entry | ~48 bytes (String) + ~32 bytes (Hashtable entry)|

Production Experience

Scenario: ETL pipeline β€” loading 50M records from CSV:

  • category field β€” 500 unique values β†’ intern() saved 99.9% memory
  • id field β€” 50M unique values β†’ intern() caused OOM after 20 minutes
  • Fix: intern() only for category, for id β€” regular String
  • Result: stable operation, heap usage dropped from 8GB to 3GB

Rule: if the number of unique values for a field is < 1% of total records β€” intern() makes sense. If > 50% β€” it’s harmful.

Scenario 2: API gateway β€” 100K RPS:

  • Each request: intern() for header names (Content-Type, Authorization)
  • StringTable grew to 500K entries
  • Without increasing StringTableSize: p99 latency grew from 5ms to 50ms
  • Fix: -XX:StringTableSize=1000003 β†’ p99 returned to 5ms

Monitoring

# StringTable statistics
jcmd <pid> VM.stringtable -verbose
# Output:
# Number of buckets       : 60013
# Number of entries       : 500234
# Maximum bucket size     : 87   ← if > 10, problem!

# GC logs
java -Xlog:gc*:file=gc.log:time,level,tags ...

# Heap histogram
jmap -histo:live <pid> | head -20

Best Practices for Highload

  • Never intern unique strings (UUIDs, IDs, hashes, timestamps)
  • Increase -XX:StringTableSize when expecting > 100K unique strings
  • Monitor Maximum bucket size via jcmd
  • Consider ConcurrentHashMap<String, String> with size limit + LRU eviction
  • For automatic savings: -XX:+UseStringDeduplication (G1 GC)

🎯 Interview Cheat Sheet

Must know:

  • String Pool can cause OOM: in Java 6 β€” PermGen space, in Java 7+ β€” Java heap space
  • Mass intern() of unique strings (UUIDs, emails, IDs) β€” main cause of OOM
  • StringTable stores strong references β€” GC won’t remove strings while entries are in table
  • Two types of problems: hash table overflow (collisions) and heap exhaustion
  • Maximum bucket size > 10 β€” sign of a problem, need to increase StringTableSize
  • Rule: intern() makes sense if unique values are < 1% of total records

Frequent follow-up questions:

  • Which strings should NOT be interned? β€” Unique ones: UUIDs, IDs, emails, hashes, timestamps.
  • How to prevent OOM from String Pool? β€” Increase StringTableSize, don’t intern unique strings, monitor via jcmd VM.stringtable.
  • What is StringTable contention? β€” StringTable is a global structure with locking. With parallel intern() from hundreds of threads β€” bottleneck.
  • Can GC clean String Pool? β€” In Java 7+ yes, if no strong references. But if you hold references β€” they won’t be collected.

Red flags (DON’T say):

  • ❌ β€œintern() for every string from DB β€” good idea” β€” only for fields with duplicates
  • ❌ β€œString Pool can’t cause OOM” β€” it can, and it’s a common problem
  • ❌ β€œGC automatically cleans the pool” β€” only if no strong references
  • ❌ β€œStringTableSize doesn’t need tuning” β€” required for > 100K unique strings

Related topics:

  • [[1. How String Pool Works]]
  • [[3. When to Use intern()]]
  • [[11. Where is String Pool Stored (Which Memory Area)]]
  • [[22. What is String Deduplication in G1 GC]]