Question 13 · Section 12

What substring() Does and How It Worked Before Java 7

The substring() method returns part of a string — a substring from a specified index to the end or to another index.

Language versions: English Russian Ukrainian

🟢 Junior Level

The substring() method returns part of a string — a substring from a specified index to the end or to another index.

Example:

String text = "Hello, World!";
String sub = text.substring(0, 5); // "Hello"
String sub2 = text.substring(7);   // "World!"

Important detail: In old Java versions (before 7u6) substring() worked cleverly — it didn’t copy data, but referenced the same character array as the original string. This caused memory problems.

In modern Java substring() always creates a copy of the needed characters — this is safe and predictable.


🟡 Middle Level

How substring() works now (Java 7u6+)

String text = "Hello, World!";
String sub = text.substring(7, 12); // "World"

Modern implementation:

  1. Calculates substring length
  2. Creates a new byte[] array (Java 9+) or char[] (Java 7-8)
  3. Copies only the needed data
  4. Returns a new String object

Complexity: O(n) — proportional to substring length (data copying).

How it worked before Java 7u6

Before Java 7u6, the String object contained:

  • char[] value — reference to character array
  • int offset — start of string in array
  • int count — string length

substring() created a new String with the same value, but different offset and count.

Plus: O(1) — instant, no copying. Minus: Memory leak — a small substring holds the huge parent array.

Typical mistakes

  1. Mistake: substring(0, 5) for a 3-character string Solution: Check boundaries or use Math.min

  2. Mistake: Expecting substring() to modify the original Solution: substring() returns a new string, original doesn’t change


🔴 Senior Level

Internal Implementation

Before Java 7u6 (shared array):

// JDK 6
String(int offset, int count, char[] value) {
    this.value = value;     // Shared reference!
    this.offset = offset;
    this.count = count;
}

String substring(int beginIndex, int endIndex) {
    // Check boundaries
    // Create new String with same char[] value
    return new String(offset + beginIndex, endIndex - beginIndex, value);
}

Java 7u6 — Java 8 (copying):

// JDK 7u6+
String substring(int beginIndex, int endIndex) {
    // Check boundaries
    int subLen = endIndex - beginIndex;
    // Copy data to new array
    return new String(value, beginIndex, subLen);
    // new String(...) calls Arrays.copyOfRange
}

Java 9+ (Compact Strings):

// JDK 9+ — byte[] instead of char[]
String substring(int beginIndex, int endIndex) {
    // Check boundaries
    int subLen = endIndex - beginIndex;
    // Copy bytes considering coder (LATIN1/UTF16)
    return isLatin1()
        ? StringLatin1.newString(value, beginIndex, subLen)
        : StringUTF16.newString(value, beginIndex, subLen);
}

Architectural Trade-offs

Old approach (shared array):

  • Pros: O(1), zero-copy, memory savings with many substrings
  • Cons: Memory leak — 5-character substring holds 10MB parent

New approach (copying):

  • Pros: Predictable memory, original can be GC’d
  • Cons: O(n) — data copying, more allocations

Edge Cases

  1. Memory Leak (Java 6):
    String huge = readLargeFile(); // 100MB
    String small = huge.substring(0, 10); // 10 chars
    huge = null; // "Deleted" huge
    // BUT: small.value still references the 100MB array!
    

    Workaround in Java 6: new String(huge.substring(0, 10)) — forced copy.

  2. IndexOutOfBoundsException:
    "abc".substring(0, 5); // Throws exception
    
  3. Empty substring:
    "abc".substring(2, 2); // "" — empty string (not null!)
    

Performance

| Operation | Java 6 (shared) | Java 7+ (copying) | Java 9+ (compact) | | —————- | ————— | —————– | —————– | | substring(0, 10) | O(1), 0 bytes | O(n), ~48 bytes | O(n), ~34 bytes | | substring from 1MB| O(1), 0 bytes | O(n), ~2MB alloc | O(n), ~1MB (Latin1)| | Memory leak risk | High | None | None |

// ~48 bytes = String header (24) + byte[] header (16) + 10 bytes data, rounded up. // Java 6: ~20000 bytes — this is the size of parent shared char[] (10,000 chars * 2 bytes), // not the substring object itself.

Production Experience

Scenario: Log parsing (Java 6):

  • Extracting requestId (36 chars) from 10MB log line
  • 100K requests → 100K substrings → each holds 10MB → OOM
  • Fix: new String(line.substring(0, 36)) — forced copy

Scenario: Migration Java 8 → Java 17:

  • In Java 8 substring() copied char[] (2 bytes/char)
  • In Java 17 substring() copies byte[] (1 byte/char for Latin1)
  • Result: -50% memory for Latin text substrings

Monitoring

// JOL — actual substring size
String huge = "A".repeat(10000);
String sub = huge.substring(0, 5);
System.out.println(GraphLayout.parseInstance(sub).toFootprint());
// Java 7+: ~48 bytes (own copy)
// Java 6:  ~20000 bytes (shares parent's array!)

Best Practices for Highload

  • In modern Java: substring() is safe — always copies
  • For zero-copy parsing: work with CharSequence, CharBuffer, or custom StringView
  • If you need a substring from huge text and parent is no longer needed: substring() automatically frees parent array on GC
  • Consider text.substring() + intern() for frequently repeated substrings

🎯 Interview Cheat Sheet

Must know:

  • substring(begin, end) returns substring from begin to end (exclusive)
  • Before Java 7u6: substring() shared parent’s char[] — O(1), but caused memory leak
  • Java 7u6+: substring() copies data — O(n), but safe
  • Java 9+: copies byte[] considering coder (Latin-1/UTF-16)
  • Memory leak in Java 6: small substring held huge parent array
  • Workaround in Java 6: new String(substring()) — forced copy

Frequent follow-up questions:

  • Why was substring() changed in Java 7? — Old version caused hidden memory leaks: 5-character substring held 10MB parent.
  • What’s the complexity of substring() now? — O(n) — copies data. Not O(1) as before.
  • Is new String(substring()) needed in modern Java? — No, it’s an extra allocation. substring() already copies.
  • What happens with substring() beyond string bounds?IndexOutOfBoundsException.

Red flags (DON’T say):

  • ❌ “substring() works in O(1)” — only in Java 6, now O(n)
  • ❌ “new String(substring()) — optimization” — was a necessity in Java 6, now redundant
  • ❌ “substring() modifies original” — String is immutable, always returns a new string
  • ❌ “Memory leak from substring() — current problem” — fixed in Java 7u6

Related topics:

  • [[14. Why substring() Implementation Was Changed in Java 7]]
  • [[4. Why String is Immutable]]
  • [[19. What are Compact Strings in Java 9+]]
  • [[20. How to Find Out How Much Memory a String Occupies]]