Question 3 · Section 12

When to Use intern()?

The intern() method adds a string to the String Pool and returns a reference to it from the pool.

Language versions: English Russian Ukrainian

🟢 Junior Level

The intern() method adds a string to the String Pool and returns a reference to it from the pool.

Imagine: you’re loading 10,000 records from a DB, and each record has the same word ‘Ukraine’. Without intern() — 10,000 separate objects. With intern() — one object and 10,000 references to it.

Simple example:

String s1 = new String("Hello"); // In regular heap
String s2 = s1.intern();          // Added to pool

String s3 = "Hello";              // From pool
System.out.println(s2 == s3);     // true — same string from pool

When to use: When you have many identical strings and want to save memory. For example, if you load 10,000 records from a database, and each record has a country = "Ukraine" field — instead of 10,000 objects in memory there will be one object in the pool.

When NOT to use: For short-lived strings that are quickly deleted. The regular Garbage Collector will handle them on its own.


🟡 Middle Level

How it works

The intern() method checks the String Pool:

  1. If such a string already exists — returns a reference from the pool
  2. If not — adds the current string to the pool and returns the reference

Practical application

// Loading data from DB — many duplicate values
while (rs.next()) {
    String city = rs.getString("city").intern();
    String country = rs.getString("country").intern();
    users.add(new User(city, country));
}

If the database has 1,000,000 records, but only 100 unique cities:

  • Without intern(): 1,000,000 String objects
  • With intern(): 100 String objects in pool + 1,000,000 references to them

Typical mistakes

  1. Mistake: Calling intern() for every string without analysis Solution: Use only for long-lived data with duplicates

  2. Mistake: Expecting instant results Solution: intern() is a native call with overhead, it’s not free

Comparison: intern() vs String Deduplication

| Characteristic | intern() | -XX:+UseStringDeduplication | | ——————— | ———————– | ————————— | | When it works | On call (synchronous) | During GC (asynchronous) | | What it combines | String objects | Internal byte[] arrays | | Requires code change? | Yes | No (only JVM flag) | | G1 GC only? | No | Yes |


🔴 Senior Level

Internal Implementation

String.intern() is a native method:

JVM_ENTRY(jstring, JVM_InternString(JNIEnv *env, jstring str))
  if (str == NULL) return NULL;
  oop string = JNIHandles::resolve_non_null(str);
  oop result = StringTable::intern(string, CHECK_NULL);
  return (jstring) JNIHandles::make_local(env, result);
JVM_END

StringTable::intern() performs:

  1. String hash computation
  2. Lookup in StringTable hash table
  3. If found — return reference
  4. If not found — insert into table (with possible resize)

Architectural Trade-offs

Pros of intern():

  • RAM savings: with 1000:1 duplicate ratio, savings >99%
  • Fewer objects → less frequent Full GC
  • Fast comparison via == (after interning)

Cons of intern():

  • CPU overhead: each call — hashing + lookup in global table
  • Contention: StringTable is a global data structure with locking
  • OOM Risk: with millions of unique strings, pool can fill Heap
  • StringTableSize: if table is small — collisions → O(n) degradation

Edge Cases

  1. Multithreaded contention: When intern() is called in parallel from hundreds of threads, contention on StringTable lock occurs.

  2. String Table Size: Default is 60013 (Java 8+). If planning 1M+ unique strings:
    -XX:StringTableSize=1000003
    
  3. Young Gen strings: intern() for short-lived strings is counterproductive — they’ll die at the next Minor GC anyway.

Performance

  • intern() without collisions: ~50-100ns
  • intern() with 1M entries and proper StringTableSize: ~200-500ns
  • intern() with 1M entries and small StringTableSize: 10-50μs (collisions!)

Production Experience

Scenario: Parsing 10GB of logs, where 500 unique log levels appear (INFO, WARN, ERROR, DEBUG, TRACE):

  • Without intern(): ~50M String objects for keys → 2.4GB
  • With intern(): 500 objects in pool → ~50KB
  • Result: Full GC every 30 seconds → every 15 minutes, p99 latency dropped from 200ms to 15ms // Fewer objects in Eden → less frequent filling → fewer Minor GC → lower latency.

Reverse scenario: User UUIDs — every string is unique. intern() here only wastes CPU and fills the pool with garbage.

Monitoring

# StringTable statistics
jcmd <pid> VM.stringtable -verbose

# Output:
# StringTable statistics:
# Number of buckets       : 60013
# Number of entries       : 1234567
# Number of loaded classes: N/A
# Maximum bucket size     : 42         ← if > 10, increase StringTableSize

Best Practices for Highload

  • Use intern() only for long-lived strings with high duplication ratio
  • Don’t intern UUIDs, hashes, tokens — they are unique
  • Profile: sometimes CPU overhead from intern() costs more than extra MBs in Heap
  • Alternative: your own ConcurrentHashMap<String, String> cache — control over eviction and size
  • For automatic deduplication without code changes: -XX:+UseStringDeduplication (G1 GC, since Java 8u20)

🎯 Interview Cheat Sheet

Must know:

  • intern() adds string to String Pool and returns reference from pool
  • Saves memory with many duplicate strings (1M records, 100 cities → 100 objects instead of 1M)
  • intern() is a native call with CPU overhead (~50-100ns without collisions)
  • Contention: StringTable is a global structure with locking, bottleneck at hundreds of threads
  • -XX:StringTableSize=1000003 — increase for 1M+ unique strings
  • Don’t use intern() for UUIDs, hashes, tokens — they’re all unique

Frequent follow-up questions:

  • When is intern() useful? — When loading data with high duplication: dictionaries, categories, cities, statuses.
  • When is intern() harmful? — For unique strings: UUIDs, IDs, emails, hashes. Fills pool, wastes CPU, saves no memory.
  • What’s faster: intern() or custom ConcurrentHashMap cache?ConcurrentHashMap gives control over eviction and size, but intern() is JVM-native, no manual management.
  • What’s the overhead of intern()? — ~50-100ns without collisions. With 1M entries and small StringTableSize: 10-50μs (collisions!).

Red flags (DON’T say):

  • ❌ “intern() for every string — good practice” — only for strings with duplicates
  • ❌ “intern() is free” — native call, CPU overhead, contention on StringTable
  • ❌ “intern() speeds up everything” — saves memory, but slows CPU
  • ❌ “intern() — the only string optimization” — there’s -XX:+UseStringDeduplication (automatic, no code)

Related topics:

  • [[1. How String Pool Works]]
  • [[12. Can String Pool Cause OutOfMemoryError]]
  • [[22. What is String Deduplication in G1 GC]]
  • [[11. Where is String Pool Stored (Which Memory Area)]]