How to Split a String in Java

From WikiHTP

You can split a String on a particular delimiting character or a Regular Expression, you can use the String.split() method that has the following signature:

public String[] split(String regex)

Note that delimiting character or regular expression gets removed from the resulting String Array.

Using Delimiting Character[edit]

String lineFromCsvFile = "Mickey;Bolton;12345;121216";
String[] dataCells = lineFromCsvFile.split(";");
// Result is dataCells = { "Mickey", "Bolton", "12345", "121216"};

Using Regular Expression[edit]

String lineFromInput = "What    do you need    from me?";
String[] words = lineFromInput.split("\\s+"); // one or more space chars
// Result is words = {"What", "do", "you", "need", "from", "me?"};

Directly Split a String Literal[edit]

String[] firstNames = "Mickey, Frank, Alicia, Tom".split(", ");
// Result is firstNames = {"Mickey", "Frank", "Alicia", "Tom"};

Warning: Do not forget that the parameter is always treated as a regular expression.

"aaa.bbb".split("."); // This returns an empty array

In the previous example . is treated as the regular expression wildcard that matches any character, and since every character is a delimiter, the result is an empty array.

Splitting based on a delimiter which is a regex meta-character[edit]

The following characters are considered special (aka meta-characters) in regex

  < > - = ! ( ) [ ] { } \ ^ $ | ? * + .

To split a string based on one of the above delimiters, you need to either escape them using \\ or use Pattern.quote():

Using Pattern.quote():

 String s = "a|b|c";
 String regex = Pattern.quote("|");
 String[] arr = s.split(regex);

Escaping the special characters:

 String s = "a|b|c";
 String[] arr = s.split("\\|");

Split removes empty values[edit]

split(delimiter) by default removes trailing empty strings from result array. To turn this mechanism off we need to use overloaded version of split(delimiter, limit) with limit set to negative value like

String[] split = data.split("\\|", -1);

split(regex) internally returns result of split(regex, 0).

The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array.

If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter. If n is negative, then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.

Splitting with a StringTokenizer[edit]

Besides the split() method Strings can also be split using a StringTokenizer.

StringTokenizer is even more restrictive than String.split(), and also a bit harder to use. It is essentially designed for pulling out tokens delimited by a fixed set of characters (given as a String). Each character will act as a separator. Because of this restriction, it's about twice as fast as String.split().

Default set of characters are empty spaces (\t\n\r\f). The following example will print out each word separately.

String str = "the lazy fox jumped over the brown fence";
StringTokenizer tokenizer = new StringTokenizer(str);
while (tokenizer.hasMoreTokens()) {
    System.out.println(tokenizer.nextToken());
}

This will print out:

the
lazy 
fox 
jumped 
over 
the 
brown 
fence

You can use different character sets for separation.

String str = "jumped over";
// In this case character `u` and `e` will be used as delimiters 
StringTokenizer tokenizer = new StringTokenizer(str, "ue");
while (tokenizer.hasMoreTokens()) {
    System.out.println(tokenizer.nextToken());
}

This will print out:

j
mp 
d ov
r

About This Tutorial

This page was last edited on 28 January 2019, at 06:36.