Why substring() Implementation Was Changed in Java 7?
Java developers changed substring() in version 7 update 6 because the old version caused hidden memory leaks.
π’ Junior Level
Java developers changed substring() in version 7 update 6 because the old version caused hidden memory leaks.
Problem: In the old version, substring() didnβt copy data, but referenced the same array as the original string. If you took a small substring from a huge text, the entire huge text remained in memory.
Example:
// Java 6 β HUGE string (e.g., file contents)
String huge = loadBigFile(); // 100 MB
// Taking small part β only 10 characters
String small = huge.substring(0, 10);
huge = null; // "Deleted" the big string
// BUT: 100 MB still in memory because small references the same array!
Solution: In Java 7+ substring() always copies the needed data. A small substring takes exactly as much memory as it should.
π‘ Middle Level
What was before Java 7u6
The String object had three fields:
char[] valueβ character arrayint offsetβ start of stringint countβ length
substring() created a new String with the same value, but different offset and count.
What changed
Starting from Java 7u6:
offsetandcountfields removed fromStringsubstring()always creates a new array and copies dataStringobject header became smaller (memory savings for ALL strings)
Practical application
// Java 7+ β safe
String huge = loadBigFile(); // 100 MB
String small = huge.substring(0, 10); // ~50 bytes (copy!)
huge = null; // 100 MB freed for GC
Typical mistakes
-
Mistake: Expecting O(1) performance Solution:
substring()β O(n), copies data -
Mistake: Using
new String(substring)as βoptimizationβ Solution: This was needed in Java 6. In modern Java β extra allocation
π΄ Senior Level
Internal Implementation β motivation for changes
JDK-7068364: Official bug report in Oracle.
Problems with shared-array approach:
- Memory leak: Substring holds entire parent array
- Complexity: Three fields (
value,offset,count) instead of one - GC overhead: One large array referenced by many substrings β harder for GC algorithms
Architectural Trade-offs
Before Java 7u6:
String Object (Java 6):
βββ char[] value (reference) βββ
βββ int offset β Shared char[]
βββ int count β [H][e][l][l][o][,][ ][W][o][r][l][d]...
βββ int hash β (can be 10MB+)
After Java 7u6:
String Object (Java 7+):
βββ char[] value βββ [W][o][r][l][d] (substring only)
βββ int hash
Why the performance sacrifice is justified
| Metric | Java 6 (shared) | Java 7+ (copying) |
|---|---|---|
| substring() time | O(1) | O(n) |
| substring() mem | 0 extra bytes | O(n) bytes |
| String obj size | 32 bytes | 24 bytes |
| Memory leak risk | High | None |
| GC friendliness | Poor | Good |
Key insight: In typical applications, substring() is called for reasonably sized strings (< 1KB). O(n) copying for such strings is nanoseconds. But memory leak from shared array β this is OOM in production.
Edge Cases
- Legacy workaround no longer needed:
// Java 6 workaround: String copy = new String(original.substring(0, 10)); // Java 7+: substring() already copies, new String() β extra allocation String copy = original.substring(0, 10); -
Java 9+ Compact Strings: Copying became even more efficient β
byte[]instead ofchar[]saves 50% for Latin strings. - Zero-copy alternatives: If performance without copying is critical:
CharSequencewrapper β doesnβt copy datajava.nio.CharBufferβ view on array- Third-party libraries:
StringView(Guava),Slice
Performance Benchmarks
| Operation | Java 6 | Java 8 | Java 17 | | ββββββββ | ββββ | ββ- | ββββ- | | substring(0, 10) from 1KB| ~1ns | ~5ns | ~3ns (Latin1) | | substring(0, 100) from 1KB| ~1ns | ~20ns | ~12ns | | substring(0, 10) from 1MB | ~1ns | ~500ns | ~250ns | | GC impact | High (shared)| Low | Low |
// Benchmarks are approximate. Actual values depend on JVM, CPU and warmup.
Best Practices for Highload
- In modern Java:
substring()is safe β use without worries - For zero-copy:
CharSequencewrappers orCharBuffer - For parsing huge files: stream processing, donβt load everything into String
- Donβt use
new String(substring())β redundant in Java 7+
π― Interview Cheat Sheet
Must know:
- Reason for change: memory leak β substring held entire parent array
- JDK-7068364 β official bug report in Oracle
- Before Java 7u6:
Stringhad 3 fields (value,offset,count), after β onlyvalue - Trade-off: O(1) β O(n) in time, but memory leak risk β none
- Removing
offsetandcountreduced String object size by 8 bytes - In Java 9+ compact strings (
byte[]) made copying even more efficient
Frequent follow-up questions:
- Why sacrifice performance? β In typical apps
substring()for strings < 1KB β nanoseconds. But memory leak β OOM in production. - Is legacy workaround
new String(substring)needed now? β No, in modern Javasubstring()already copies,new String()β extra allocation. - What zero-copy alternatives exist? β
CharSequencewrapper,CharBuffer, GuavaStringView. - How did String object size change? β Became smaller: removed
offsetandcountβ savings ~8 bytes per string.
Red flags (DONβT say):
- β βChanged for no reasonβ β there was a critical memory leak
- β β
substring()is still O(1)β β O(n) since Java 7u6 - β βNeed to use
new String(substring())for safetyβ β redundant in Java 7+ - β βChanges broke backward compatibilityβ β
substring()behavior for user didnβt change
Related topics:
- [[13. What substring() Does and How It Worked Before Java 7]]
- [[19. What are Compact Strings in Java 9+]]
- [[20. How to Find Out How Much Memory a String Occupies]]
- [[4. Why String is Immutable]]