When to Use intern()?
The intern() method adds a string to the String Pool and returns a reference to it from the pool.
🟢 Junior Level
The intern() method adds a string to the String Pool and returns a reference to it from the pool.
Imagine: you’re loading 10,000 records from a DB, and each record has the same word ‘Ukraine’.
Without intern() — 10,000 separate objects. With intern() — one object and 10,000 references to it.
Simple example:
String s1 = new String("Hello"); // In regular heap
String s2 = s1.intern(); // Added to pool
String s3 = "Hello"; // From pool
System.out.println(s2 == s3); // true — same string from pool
When to use: When you have many identical strings and want to save memory. For example, if you load 10,000 records from a database, and each record has a country = "Ukraine" field — instead of 10,000 objects in memory there will be one object in the pool.
When NOT to use: For short-lived strings that are quickly deleted. The regular Garbage Collector will handle them on its own.
🟡 Middle Level
How it works
The intern() method checks the String Pool:
- If such a string already exists — returns a reference from the pool
- If not — adds the current string to the pool and returns the reference
Practical application
// Loading data from DB — many duplicate values
while (rs.next()) {
String city = rs.getString("city").intern();
String country = rs.getString("country").intern();
users.add(new User(city, country));
}
If the database has 1,000,000 records, but only 100 unique cities:
- Without
intern(): 1,000,000 String objects - With
intern(): 100 String objects in pool + 1,000,000 references to them
Typical mistakes
-
Mistake: Calling
intern()for every string without analysis Solution: Use only for long-lived data with duplicates -
Mistake: Expecting instant results Solution:
intern()is a native call with overhead, it’s not free
Comparison: intern() vs String Deduplication
| Characteristic | intern() | -XX:+UseStringDeduplication | | ——————— | ———————– | ————————— | | When it works | On call (synchronous) | During GC (asynchronous) | | What it combines | String objects | Internal byte[] arrays | | Requires code change? | Yes | No (only JVM flag) | | G1 GC only? | No | Yes |
🔴 Senior Level
Internal Implementation
String.intern() is a native method:
JVM_ENTRY(jstring, JVM_InternString(JNIEnv *env, jstring str))
if (str == NULL) return NULL;
oop string = JNIHandles::resolve_non_null(str);
oop result = StringTable::intern(string, CHECK_NULL);
return (jstring) JNIHandles::make_local(env, result);
JVM_END
StringTable::intern() performs:
- String hash computation
- Lookup in StringTable hash table
- If found — return reference
- If not found — insert into table (with possible resize)
Architectural Trade-offs
Pros of intern():
- RAM savings: with 1000:1 duplicate ratio, savings >99%
- Fewer objects → less frequent Full GC
- Fast comparison via
==(after interning)
Cons of intern():
- CPU overhead: each call — hashing + lookup in global table
- Contention: StringTable is a global data structure with locking
- OOM Risk: with millions of unique strings, pool can fill Heap
- StringTableSize: if table is small — collisions → O(n) degradation
Edge Cases
-
Multithreaded contention: When
intern()is called in parallel from hundreds of threads, contention on StringTable lock occurs. - String Table Size: Default is 60013 (Java 8+). If planning 1M+ unique strings:
-XX:StringTableSize=1000003 - Young Gen strings:
intern()for short-lived strings is counterproductive — they’ll die at the next Minor GC anyway.
Performance
intern()without collisions: ~50-100nsintern()with 1M entries and proper StringTableSize: ~200-500nsintern()with 1M entries and small StringTableSize: 10-50μs (collisions!)
Production Experience
Scenario: Parsing 10GB of logs, where 500 unique log levels appear (INFO, WARN, ERROR, DEBUG, TRACE):
- Without
intern(): ~50M String objects for keys → 2.4GB - With
intern(): 500 objects in pool → ~50KB - Result: Full GC every 30 seconds → every 15 minutes, p99 latency dropped from 200ms to 15ms // Fewer objects in Eden → less frequent filling → fewer Minor GC → lower latency.
Reverse scenario: User UUIDs — every string is unique. intern() here only wastes CPU and fills the pool with garbage.
Monitoring
# StringTable statistics
jcmd <pid> VM.stringtable -verbose
# Output:
# StringTable statistics:
# Number of buckets : 60013
# Number of entries : 1234567
# Number of loaded classes: N/A
# Maximum bucket size : 42 ← if > 10, increase StringTableSize
Best Practices for Highload
- Use
intern()only for long-lived strings with high duplication ratio - Don’t intern UUIDs, hashes, tokens — they are unique
- Profile: sometimes CPU overhead from
intern()costs more than extra MBs in Heap - Alternative: your own
ConcurrentHashMap<String, String>cache — control over eviction and size - For automatic deduplication without code changes:
-XX:+UseStringDeduplication(G1 GC, since Java 8u20)
🎯 Interview Cheat Sheet
Must know:
intern()adds string to String Pool and returns reference from pool- Saves memory with many duplicate strings (1M records, 100 cities → 100 objects instead of 1M)
intern()is a native call with CPU overhead (~50-100ns without collisions)- Contention: StringTable is a global structure with locking, bottleneck at hundreds of threads
-XX:StringTableSize=1000003— increase for 1M+ unique strings- Don’t use
intern()for UUIDs, hashes, tokens — they’re all unique
Frequent follow-up questions:
- When is
intern()useful? — When loading data with high duplication: dictionaries, categories, cities, statuses. - When is
intern()harmful? — For unique strings: UUIDs, IDs, emails, hashes. Fills pool, wastes CPU, saves no memory. - What’s faster:
intern()or customConcurrentHashMapcache? —ConcurrentHashMapgives control over eviction and size, butintern()is JVM-native, no manual management. - What’s the overhead of
intern()? — ~50-100ns without collisions. With 1M entries and small StringTableSize: 10-50μs (collisions!).
Red flags (DON’T say):
- ❌ “intern() for every string — good practice” — only for strings with duplicates
- ❌ “intern() is free” — native call, CPU overhead, contention on StringTable
- ❌ “intern() speeds up everything” — saves memory, but slows CPU
- ❌ “intern() — the only string optimization” — there’s
-XX:+UseStringDeduplication(automatic, no code)
Related topics:
- [[1. How String Pool Works]]
- [[12. Can String Pool Cause OutOfMemoryError]]
- [[22. What is String Deduplication in G1 GC]]
- [[11. Where is String Pool Stored (Which Memory Area)]]