How split() Method Works
The split() method breaks a string into an array of substrings by a given delimiter.
π’ Junior Level
The split() method breaks a string into an array of substrings by a given delimiter.
Simple example:
String data = "apple,banana,cherry";
String[] fruits = data.split(",");
// ["apple", "banana", "cherry"]
Important: The delimiter is a regular expression, not just a string! Some characters need escaping:
String ip = "192.168.1.1";
String[] parts = ip.split("\\."); // Dot needs escaping!
// ["192", "168", "1", "1"]
Empty strings at end are removed by default:
"a,b,,,".split(","); // ["a", "b"] β empties removed
"a,b,,,".split(",", -1); // ["a", "b", "", "", ""] β all preserved
π‘ Middle Level
Two versions of the method
String[] split(String regex) // limit = 0
String[] split(String regex, int limit) // full control
limit parameter:
| Limit | Behavior | Example "a,b,c,,".split(",", limit) |
| ββββ | ββββββββββββββββββ | ββββββββββββ- |
| 0 (default)| Max splitting, empty at end removed | ["a", "b", "c"] |
| > 0 | No more than limit elements, rest β in last element | ["a", "b,c,,"] (limit=2) |
| < 0 | Max splitting, empty preserved | ["a", "b", "c", "", ""] |
Fast Path optimization
split() does NOT always use the heavy regex engine. If the delimiter is a single character (not a regex metacharacter), direct search is used:
// Fast Path β no regex compilation
"hello world".split(" ");
// Regex engine β Pattern/Matcher compilation
"hello world".split("\\s+");
| Metacharacters that break Fast Path: ., $, | , (, ), [, ], ^, ?, *, +, \ |
Typical mistakes
-
Mistake:
split(".")β dot = βany characterβ in regex Solution:split("\\.")orsplit(Pattern.quote(".")) -
Mistake: Expecting empty strings at end by default Solution: Use
split(",", -1)to preserve empties -
Mistake:
split()in a loop for the same regex Solution: CompilePatternonce:Pattern.compile(",").split(str)
π΄ Senior Level
Internal Implementation
OpenJDK β String.split():
public String[] split(String regex, int limit) {
char ch = 0;
// Fast Path: single character, not regex meta
if (((regex.value.length == 1 &&
".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
(regex.length() == 2 &&
regex.charAt(0) == '\\' &&
(((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
((ch-'a')|('z'-ch)) < 0 &&
((ch-'A')|('Z'-ch)) < 0)) &&
(ch < Character.MIN_HIGH_SURROGATE ||
ch > Character.MAX_LOW_SURROGATE)) {
// This checks that ch is NOT a lowercase letter (similarly for uppercase
// and digits) β guaranteeing that \ is not a known shorthand (\d, \w etc.)
// FAST PATH β direct search through byte array
int off = 0;
int next = indexOf(ch, off);
// ... manual splitting without Pattern/Matcher
}
// SLOW PATH β via Pattern
return Pattern.compile(regex).split(this, limit);
}
Architectural Trade-offs
Fast Path:
- Pros: No Pattern/Matcher allocation, ~5-10ns per call
- Cons: Works only for simplest delimiters
Regex Engine (Pattern/Matcher):
- Pros: Full power of regular expressions
- Cons: Regex compilation (~1-5ΞΌs), Pattern + Matcher + results allocations
Edge Cases
- Empty input:
"".split(","); // [""] β array with one empty string ",".split(","); // [] β empty array (limit=0 removes empties) ",".split(",", -1); // ["", ""] β two empty strings - Regex with lookahead/lookbehind:
"a1b2c3".split("(?=\\d)"); // ["a", "1b", "2c", "3"] β split before digit - Trailing empty strings:
"a,,b".split(","); // ["a", "", "b"] "a,,b,,,".split(","); // ["a", "", "b"] β trailing removed "a,,b,,,".split(",", -1); // ["a", "", "b", "", "", ""]
Performance
| Scenario | Fast Path | Regex Engine | Pre-compiled Pattern |
| βββββββ- | βββ | ββββ | βββββββ |
| split(",") 1M times | ~50ms | ~500ms | ~80ms |
| split("\\ | ") 1M | ~60ms | ~500ms | ~80ms |
| split("\\s+") 1M | N/A | ~800ms | ~120ms |
| Regex compile overhead | 0 | ~1-5ΞΌs | 0 (once) |
Production Experience
Scenario: CSV parsing (10M lines):
// BAD β regex compilation on every line
for (String line : lines) {
String[] fields = line.split(","); // 10M regex compilations!
}
// GOOD β pre-compiled Pattern
private static final Pattern COMMA = Pattern.compile(",");
for (String line : lines) {
String[] fields = COMMA.split(line);
}
// BETTER β Fast Path (single char, not meta)
for (String line : lines) {
String[] fields = line.split(","); // Fast Path will trigger!
}
Scenario 2: Log file parsing with regex delimiter:
line.split("\\s\\|\\s")β not Fast Path, every call compiles regex- Fix:
private static final Pattern SEP = Pattern.compile("\\s\\|\\s"); - Result: -80% CPU on parsing
Monitoring
// JMH benchmark
@Benchmark
public String[] testSplit() {
return input.split(",");
}
@Benchmark
public String[] testPrecompiled() {
return COMMA.split(input);
}
// Profile regex compilation
java -XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions ...
Best Practices for Highload
- For single-character delimiters (not meta):
split(",")β Fast Path - For regex delimiters: pre-compiled
Pattern.compile(regex).split(str) - In hot paths: consider manual implementation via
indexOf()β minimum allocations - For CSV/TSV: specialized libraries (OpenCSV, Apache Commons CSV)
- For ultra-low-latency: zero-copy parsing via
CharSequencewrappers
π― Interview Cheat Sheet
Must know:
split(regex)splits string by regular expression, returns array- Fast Path: for single-character non-meta delimiters β direct search without regex engine
limitparameter:0β trailing empties removed,< 0β preserved,> 0β max elements- Regex metacharacters need escaping:
. $ | ( ) [ ] { } ^ ? * + \ Pattern.quote(".")β safe way to escape forsplit()- Compiling regex on every loop iteration β antipattern, use pre-compiled
Pattern
Frequent follow-up questions:
- Why doesnβt
split(".")work? β Dot in regex = βany characterβ. Needsplit("\\."). - What does
split(",", -1)do? β Preserves empty strings at end. By default (limit=0) theyβre removed. - What is Fast Path in
split()? β If delimiter is a single non-meta character, direct search is used without Pattern/Matcher. - How to optimize
split()in a loop? β Pre-compiledPattern:private static final Pattern COMMA = Pattern.compile(",").
Red flags (DONβT say):
- β β
split()takes a plain string, not regexβ β takes regex, dot will break logic - β β
split()always preserves empty stringsβ β by default (limit=0) removes trailing empty - β βYou can compile regex in a loop without consequencesβ β 10M compilations = seconds of CPU
- β β
split()β the only way to split a stringβ β thereβsindexOf(),StringTokenizer, specialized parsers
Related topics:
- [[16. Difference Between replace() vs replaceAll()]]
- [[8. How Java Compiler Optimizes String Concatenation]]
- [[7. What Happens When Concatenating Strings with + Operator]]