What substring() Does and How It Worked Before Java 7
The substring() method returns part of a string — a substring from a specified index to the end or to another index.
🟢 Junior Level
The substring() method returns part of a string — a substring from a specified index to the end or to another index.
Example:
String text = "Hello, World!";
String sub = text.substring(0, 5); // "Hello"
String sub2 = text.substring(7); // "World!"
Important detail: In old Java versions (before 7u6) substring() worked cleverly — it didn’t copy data, but referenced the same character array as the original string. This caused memory problems.
In modern Java substring() always creates a copy of the needed characters — this is safe and predictable.
🟡 Middle Level
How substring() works now (Java 7u6+)
String text = "Hello, World!";
String sub = text.substring(7, 12); // "World"
Modern implementation:
- Calculates substring length
- Creates a new
byte[]array (Java 9+) orchar[](Java 7-8) - Copies only the needed data
- Returns a new
Stringobject
Complexity: O(n) — proportional to substring length (data copying).
How it worked before Java 7u6
Before Java 7u6, the String object contained:
char[] value— reference to character arrayint offset— start of string in arrayint count— string length
substring() created a new String with the same value, but different offset and count.
Plus: O(1) — instant, no copying. Minus: Memory leak — a small substring holds the huge parent array.
Typical mistakes
-
Mistake:
substring(0, 5)for a 3-character string Solution: Check boundaries or useMath.min -
Mistake: Expecting
substring()to modify the original Solution:substring()returns a new string, original doesn’t change
🔴 Senior Level
Internal Implementation
Before Java 7u6 (shared array):
// JDK 6
String(int offset, int count, char[] value) {
this.value = value; // Shared reference!
this.offset = offset;
this.count = count;
}
String substring(int beginIndex, int endIndex) {
// Check boundaries
// Create new String with same char[] value
return new String(offset + beginIndex, endIndex - beginIndex, value);
}
Java 7u6 — Java 8 (copying):
// JDK 7u6+
String substring(int beginIndex, int endIndex) {
// Check boundaries
int subLen = endIndex - beginIndex;
// Copy data to new array
return new String(value, beginIndex, subLen);
// new String(...) calls Arrays.copyOfRange
}
Java 9+ (Compact Strings):
// JDK 9+ — byte[] instead of char[]
String substring(int beginIndex, int endIndex) {
// Check boundaries
int subLen = endIndex - beginIndex;
// Copy bytes considering coder (LATIN1/UTF16)
return isLatin1()
? StringLatin1.newString(value, beginIndex, subLen)
: StringUTF16.newString(value, beginIndex, subLen);
}
Architectural Trade-offs
Old approach (shared array):
- Pros: O(1), zero-copy, memory savings with many substrings
- Cons: Memory leak — 5-character substring holds 10MB parent
New approach (copying):
- Pros: Predictable memory, original can be GC’d
- Cons: O(n) — data copying, more allocations
Edge Cases
- Memory Leak (Java 6):
String huge = readLargeFile(); // 100MB String small = huge.substring(0, 10); // 10 chars huge = null; // "Deleted" huge // BUT: small.value still references the 100MB array!Workaround in Java 6:
new String(huge.substring(0, 10))— forced copy. - IndexOutOfBoundsException:
"abc".substring(0, 5); // Throws exception - Empty substring:
"abc".substring(2, 2); // "" — empty string (not null!)
Performance
| Operation | Java 6 (shared) | Java 7+ (copying) | Java 9+ (compact) | | —————- | ————— | —————– | —————– | | substring(0, 10) | O(1), 0 bytes | O(n), ~48 bytes | O(n), ~34 bytes | | substring from 1MB| O(1), 0 bytes | O(n), ~2MB alloc | O(n), ~1MB (Latin1)| | Memory leak risk | High | None | None |
// ~48 bytes = String header (24) + byte[] header (16) + 10 bytes data, rounded up. // Java 6: ~20000 bytes — this is the size of parent shared char[] (10,000 chars * 2 bytes), // not the substring object itself.
Production Experience
Scenario: Log parsing (Java 6):
- Extracting
requestId(36 chars) from 10MB log line - 100K requests → 100K substrings → each holds 10MB → OOM
- Fix:
new String(line.substring(0, 36))— forced copy
Scenario: Migration Java 8 → Java 17:
- In Java 8
substring()copiedchar[](2 bytes/char) - In Java 17
substring()copiesbyte[](1 byte/char for Latin1) - Result: -50% memory for Latin text substrings
Monitoring
// JOL — actual substring size
String huge = "A".repeat(10000);
String sub = huge.substring(0, 5);
System.out.println(GraphLayout.parseInstance(sub).toFootprint());
// Java 7+: ~48 bytes (own copy)
// Java 6: ~20000 bytes (shares parent's array!)
Best Practices for Highload
- In modern Java:
substring()is safe — always copies - For zero-copy parsing: work with
CharSequence,CharBuffer, or customStringView - If you need a substring from huge text and parent is no longer needed:
substring()automatically frees parent array on GC - Consider
text.substring()+intern()for frequently repeated substrings
🎯 Interview Cheat Sheet
Must know:
substring(begin, end)returns substring frombegintoend(exclusive)- Before Java 7u6:
substring()shared parent’schar[]— O(1), but caused memory leak - Java 7u6+:
substring()copies data — O(n), but safe - Java 9+: copies
byte[]considering coder (Latin-1/UTF-16) - Memory leak in Java 6: small substring held huge parent array
- Workaround in Java 6:
new String(substring())— forced copy
Frequent follow-up questions:
- Why was
substring()changed in Java 7? — Old version caused hidden memory leaks: 5-character substring held 10MB parent. - What’s the complexity of
substring()now? — O(n) — copies data. Not O(1) as before. - Is
new String(substring())needed in modern Java? — No, it’s an extra allocation.substring()already copies. - What happens with
substring()beyond string bounds? —IndexOutOfBoundsException.
Red flags (DON’T say):
- ❌ “
substring()works in O(1)” — only in Java 6, now O(n) - ❌ “
new String(substring())— optimization” — was a necessity in Java 6, now redundant - ❌ “
substring()modifies original” — String is immutable, always returns a new string - ❌ “Memory leak from
substring()— current problem” — fixed in Java 7u6
Related topics:
- [[14. Why substring() Implementation Was Changed in Java 7]]
- [[4. Why String is Immutable]]
- [[19. What are Compact Strings in Java 9+]]
- [[20. How to Find Out How Much Memory a String Occupies]]