Can String Pool Cause OutOfMemoryError?
Each unique string in the pool is an object (~48 bytes) + a hash table entry (~32 bytes). 100 million unique strings = ~8GB for the pool alone.
π’ Junior Level
Yes, it can. But the error type depends on the Java version.
Each unique string in the pool is an object (~48 bytes) + a hash table entry (~32 bytes). 100 million unique strings = ~8GB for the pool alone.
Java 6 and earlier:
java.lang.OutOfMemoryError: PermGen space
String pool was in the PermGen area with fixed size. Lots of intern() β memory ran out β crash.
Java 7+:
java.lang.OutOfMemoryError: Java heap space
Pool moved to main heap. If you intern millions of unique strings β they fill up the entire heap.
Example:
// Dangerous code β can cause OOM
List<String> list = new ArrayList<>();
for (int i = 0; i < 100_000_000; i++) {
list.add(String.valueOf(i).intern()); // Each string is unique!
}
How to avoid: Donβt use intern() for unique strings (UUIDs, hashes, IDs). Use only for strings with duplicates.
π‘ Middle Level
When OOM occurs
Scenario 1: Mass intern() of unique strings
// Each string is unique β pool grows uncontrollably
for (User user : users) {
String email = user.getEmail().intern(); // UUIDs/emails β all different
}
Scenario 2: Leak through strong references
// Strings in pool + references in collection = never collected by GC
Set<String> cache = new HashSet<>();
while (true) {
String data = readFromNetwork().intern();
cache.add(data); // Grows infinitely
}
How to prevent
- Monitoring:
jcmd <pid> VM.stringtable -verbose - Tuning:
-XX:StringTableSize=1000003 - Alternative:
-XX:+UseStringDeduplication(G1 GC) - Custom cache:
ConcurrentHashMap<String, String>with eviction
Typical mistakes
-
Mistake: Thinking GC will automatically clean the pool Solution: StringTable is a native JVM hash table that stores strong references to String objects. As long as the entry is in the table β the object is not eligible for GC.
-
Mistake:
intern()for every string from DB Solution: Only for fields with high duplication ratio
π΄ Senior Level
Internal Implementation
StringTable β native hash table:
oop StringTable::intern(Symbol* string, TRAPS) {
unsigned int hashValue = hash_string(string);
int index = the_table()->hash_to_index(hashValue);
oop found_string = the_table()->lookup(index, string, hashValue);
// Found
if (found_string != NULL) return found_string;
// Not found β create new entry in StringTable
Handle string_object = java_lang_String::create_from_symbol(string, CHECK_NULL);
the_table()->basic_add(index, string_object, string, hashValue, CHECK_NULL);
return string_object();
}
Each entry in StringTable is a strong reference. GC wonβt remove the String while the entry is in the table.
Two types of OOM
Type 1: StringTable overflow (hash collisions)
- When
StringTableSize< number of strings β long collision chains intern()degrades to O(n)- Application βhangsβ β CPU 100% on table lookup
- May manifest as
GC Overhead Limit Exceeded
Type 2: Heap exhaustion
- Millions of unique interned strings fill up Heap
OOM: Java heap space- Happens when pool competes for memory with business objects
Architectural Trade-offs
String Pool and GC:
- Young Gen: new intern() β Eden β die quickly (if no references)
- Old Gen: long-lived intern() β Old Gen β Full GC scans them all
- G1 GC: StringTable scan adds to evacuation pause
Contention:
StringTableβ global structure with locking- With parallel
intern()from hundreds of threads β contention - Can become a bottleneck in highload systems
Edge Cases
-
Default StringTableSize (60013): If you intern 1M+ unique strings, average chain length = 1M / 60013 β 16. Worst case β MUCH more.
-
GC and String Pool cleaning: Starting from Java 7u40, JVM removes unreachable entries from StringTable during Full GC. But this only works if there are no strong references to the strings.
-
ZGC/Shenandoah: These GCs use concurrent marking. StringTable scan happens concurrently, but overhead still exists.
Performance
| Metric | Value | | βββββββββββ | βββββββββββββββ- | | StringTableSize default | 60013 | | Max safe entries (default size) | ~100K | | intern() without collisions | ~50-100ns | | intern() with collisions (1M) | 10-50ΞΌs | | Memory per entry | ~48 bytes (String) + ~32 bytes (Hashtable entry)|
Production Experience
Scenario: ETL pipeline β loading 50M records from CSV:
categoryfield β 500 unique values βintern()saved 99.9% memoryidfield β 50M unique values βintern()caused OOM after 20 minutes- Fix:
intern()only forcategory, foridβ regular String - Result: stable operation, heap usage dropped from 8GB to 3GB
Rule: if the number of unique values for a field is < 1% of total records β
intern()makes sense. If > 50% β itβs harmful.
Scenario 2: API gateway β 100K RPS:
- Each request:
intern()for header names (Content-Type,Authorization) - StringTable grew to 500K entries
- Without increasing StringTableSize: p99 latency grew from 5ms to 50ms
- Fix:
-XX:StringTableSize=1000003β p99 returned to 5ms
Monitoring
# StringTable statistics
jcmd <pid> VM.stringtable -verbose
# Output:
# Number of buckets : 60013
# Number of entries : 500234
# Maximum bucket size : 87 β if > 10, problem!
# GC logs
java -Xlog:gc*:file=gc.log:time,level,tags ...
# Heap histogram
jmap -histo:live <pid> | head -20
Best Practices for Highload
- Never intern unique strings (UUIDs, IDs, hashes, timestamps)
- Increase
-XX:StringTableSizewhen expecting > 100K unique strings - Monitor
Maximum bucket sizeviajcmd - Consider
ConcurrentHashMap<String, String>with size limit + LRU eviction - For automatic savings:
-XX:+UseStringDeduplication(G1 GC)
π― Interview Cheat Sheet
Must know:
- String Pool can cause OOM: in Java 6 β
PermGen space, in Java 7+ βJava heap space - Mass
intern()of unique strings (UUIDs, emails, IDs) β main cause of OOM - StringTable stores strong references β GC wonβt remove strings while entries are in table
- Two types of problems: hash table overflow (collisions) and heap exhaustion
Maximum bucket size > 10β sign of a problem, need to increaseStringTableSize- Rule:
intern()makes sense if unique values are < 1% of total records
Frequent follow-up questions:
- Which strings should NOT be interned? β Unique ones: UUIDs, IDs, emails, hashes, timestamps.
- How to prevent OOM from String Pool? β Increase
StringTableSize, donβt intern unique strings, monitor viajcmd VM.stringtable. - What is StringTable contention? β
StringTableis a global structure with locking. With parallelintern()from hundreds of threads β bottleneck. - Can GC clean String Pool? β In Java 7+ yes, if no strong references. But if you hold references β they wonβt be collected.
Red flags (DONβT say):
- β βintern() for every string from DB β good ideaβ β only for fields with duplicates
- β βString Pool canβt cause OOMβ β it can, and itβs a common problem
- β βGC automatically cleans the poolβ β only if no strong references
- β βStringTableSize doesnβt need tuningβ β required for > 100K unique strings
Related topics:
- [[1. How String Pool Works]]
- [[3. When to Use intern()]]
- [[11. Where is String Pool Stored (Which Memory Area)]]
- [[22. What is String Deduplication in G1 GC]]