Question 14 Β· Section 12

Why substring() Implementation Was Changed in Java 7?

Java developers changed substring() in version 7 update 6 because the old version caused hidden memory leaks.

Language versions: English Russian Ukrainian

🟒 Junior Level

Java developers changed substring() in version 7 update 6 because the old version caused hidden memory leaks.

Problem: In the old version, substring() didn’t copy data, but referenced the same array as the original string. If you took a small substring from a huge text, the entire huge text remained in memory.

Example:

// Java 6 β€” HUGE string (e.g., file contents)
String huge = loadBigFile(); // 100 MB
// Taking small part β€” only 10 characters
String small = huge.substring(0, 10);
huge = null; // "Deleted" the big string
// BUT: 100 MB still in memory because small references the same array!

Solution: In Java 7+ substring() always copies the needed data. A small substring takes exactly as much memory as it should.


🟑 Middle Level

What was before Java 7u6

The String object had three fields:

  • char[] value β€” character array
  • int offset β€” start of string
  • int count β€” length

substring() created a new String with the same value, but different offset and count.

What changed

Starting from Java 7u6:

  • offset and count fields removed from String
  • substring() always creates a new array and copies data
  • String object header became smaller (memory savings for ALL strings)

Practical application

// Java 7+ β€” safe
String huge = loadBigFile();     // 100 MB
String small = huge.substring(0, 10); // ~50 bytes (copy!)
huge = null;                     // 100 MB freed for GC

Typical mistakes

  1. Mistake: Expecting O(1) performance Solution: substring() β€” O(n), copies data

  2. Mistake: Using new String(substring) as β€œoptimization” Solution: This was needed in Java 6. In modern Java β€” extra allocation


πŸ”΄ Senior Level

Internal Implementation β€” motivation for changes

JDK-7068364: Official bug report in Oracle.

Problems with shared-array approach:

  1. Memory leak: Substring holds entire parent array
  2. Complexity: Three fields (value, offset, count) instead of one
  3. GC overhead: One large array referenced by many substrings β€” harder for GC algorithms

Architectural Trade-offs

Before Java 7u6:

String Object (Java 6):
β”œβ”€β”€ char[] value (reference) ──┐
β”œβ”€β”€ int offset                  β”‚  Shared char[]
β”œβ”€β”€ int count                   β”‚  [H][e][l][l][o][,][ ][W][o][r][l][d]...
└── int hash                    β”‚  (can be 10MB+)

After Java 7u6:

String Object (Java 7+):
β”œβ”€β”€ char[] value ──→ [W][o][r][l][d]  (substring only)
└── int hash

Why the performance sacrifice is justified

Metric Java 6 (shared) Java 7+ (copying)
substring() time O(1) O(n)
substring() mem 0 extra bytes O(n) bytes
String obj size 32 bytes 24 bytes
Memory leak risk High None
GC friendliness Poor Good

Key insight: In typical applications, substring() is called for reasonably sized strings (< 1KB). O(n) copying for such strings is nanoseconds. But memory leak from shared array β€” this is OOM in production.

Edge Cases

  1. Legacy workaround no longer needed:
    // Java 6 workaround:
    String copy = new String(original.substring(0, 10));
    
    // Java 7+: substring() already copies, new String() β€” extra allocation
    String copy = original.substring(0, 10);
    
  2. Java 9+ Compact Strings: Copying became even more efficient β€” byte[] instead of char[] saves 50% for Latin strings.

  3. Zero-copy alternatives: If performance without copying is critical:
    • CharSequence wrapper β€” doesn’t copy data
    • java.nio.CharBuffer β€” view on array
    • Third-party libraries: StringView (Guava), Slice

Performance Benchmarks

| Operation | Java 6 | Java 8 | Java 17 | | β€”β€”β€”β€”β€”β€”β€”β€” | ———– | β€”β€”- | β€”β€”β€”β€”- | | substring(0, 10) from 1KB| ~1ns | ~5ns | ~3ns (Latin1) | | substring(0, 100) from 1KB| ~1ns | ~20ns | ~12ns | | substring(0, 10) from 1MB | ~1ns | ~500ns | ~250ns | | GC impact | High (shared)| Low | Low |

// Benchmarks are approximate. Actual values depend on JVM, CPU and warmup.

Best Practices for Highload

  • In modern Java: substring() is safe β€” use without worries
  • For zero-copy: CharSequence wrappers or CharBuffer
  • For parsing huge files: stream processing, don’t load everything into String
  • Don’t use new String(substring()) β€” redundant in Java 7+

🎯 Interview Cheat Sheet

Must know:

  • Reason for change: memory leak β€” substring held entire parent array
  • JDK-7068364 β€” official bug report in Oracle
  • Before Java 7u6: String had 3 fields (value, offset, count), after β€” only value
  • Trade-off: O(1) β†’ O(n) in time, but memory leak risk β†’ none
  • Removing offset and count reduced String object size by 8 bytes
  • In Java 9+ compact strings (byte[]) made copying even more efficient

Frequent follow-up questions:

  • Why sacrifice performance? β€” In typical apps substring() for strings < 1KB β€” nanoseconds. But memory leak β€” OOM in production.
  • Is legacy workaround new String(substring) needed now? β€” No, in modern Java substring() already copies, new String() β€” extra allocation.
  • What zero-copy alternatives exist? β€” CharSequence wrapper, CharBuffer, Guava StringView.
  • How did String object size change? β€” Became smaller: removed offset and count β€” savings ~8 bytes per string.

Red flags (DON’T say):

  • ❌ β€œChanged for no reason” β€” there was a critical memory leak
  • ❌ β€œsubstring() is still O(1)” β€” O(n) since Java 7u6
  • ❌ β€œNeed to use new String(substring()) for safety” β€” redundant in Java 7+
  • ❌ β€œChanges broke backward compatibility” β€” substring() behavior for user didn’t change

Related topics:

  • [[13. What substring() Does and How It Worked Before Java 7]]
  • [[19. What are Compact Strings in Java 9+]]
  • [[20. How to Find Out How Much Memory a String Occupies]]
  • [[4. Why String is Immutable]]