如何在Java中分割字符串并保留分隔符?

时间:2023-01-29 09:10:21

I have this string (Java 1.5):

我有这个字符串(Java 1.5):

:alpha;beta:gamma;delta

I need to get an array:

我需要一个数组:

{":alpha", ";beta", ":gamma", ";delta"}

What is the most convenient way to do it in Java?

在Java中最方便的方法是什么?

6 个解决方案

#1


26  

str.split("(?=[:;])")

This will give you the desired array, only with an empty first item. And:

这将为您提供所需的数组,只有第一个项为空。和:

str.split("(?=\\b[:;])")

This will give the array without the empty first item.

这将给出没有第一个空项的数组。

  • The key here is the (?=X) which is a zero-width positive lookahead (non-capturing construct) (see regex pattern docs).
  • 这里的关键是(?=X),它是一个零宽度的正前瞻性(非捕获构造)(参见regex模式文档)。
  • [:;] means "either ; or :"
  • (,):意为“;或者:“
  • \b is word-boundary - it's there in order not to consider the first : as delimiter (since it is the beginning of the sequence)
  • \b是单词边界-它的存在是为了不考虑第一个:作为分隔符(因为它是序列的开始)

#2


4  

To keep the separators, you can use a StringTokenizer:

为了保持隔离器,你可以使用StringTokenizer:

new StringTokenizer(":alpha;beta:gamma;delta", ":;", true)

That would yield the separators as tokens.

这将使分隔符成为令牌。

To have them as part of your tokens, you could use String#split with lookahead.

要将它们作为您的标记的一部分,您可以使用带有lookahead的字符串#分隔符。

#3


1  

You can do this by simply using patterns and matcher class in java regx.

您可以通过在java regx中使用模式和matcher类来实现这一点。

    public static String[] mysplit(String text)
    {
     List<String> s = new ArrayList<String>();
     Matcher m = Pattern.compile("(:|;)\\w+").matcher(text);
     while(m.find()) {
   s.add(m.group());
     }
     return s.toArray(new String[s.size()]);
    }

#4


1  

/**
 * @param list an empty String list. used for internal purpose. 
 * @param str  String which has to be processed.
 * @return Splited String Array with delimiters.
 */
public  String[] split(ArrayList<String> list, String str){
  for(int i = str.length()-1 ; i >=0 ; i--){
     if(!Character.isLetterOrDigit((str.charAt(i)))) {
        list.add(str.substring(i, str.length()));
        split(list,str.substring(0,i));
        break;
     }
  }
  return list.toArray(new String[list.size()]);
}

#5


0  

This should work with Java 1.5 (Pattern.quote was introduced in Java 1.5).

这应该与Java 1.5(模式)一起工作。在Java 1.5中引入了quote。

// Split the string on delimiter, but don't delete the delimiter
private String[] splitStringOnDelimiter(String text, String delimiter, String safeSequence){
    // A temporary delimiter must be added as Java split method deletes the delimiter

    // for safeSequence use something that doesn't occur in your texts 
    text=text.replaceAll(Pattern.quote(delimiter), safeSequence+delimiter);
    return text.split(Pattern.quote(safeSequence));
}

If first element is the problem:

如果第一个元素是问题:

private String[] splitStringOnDelimiter(String text, String delimiter, String safeSequence){
    text=text.replaceAll(Pattern.quote(delimiter), safeSequence+delimiter);
    String[] tempArray = text.split(Pattern.quote(safeSequence));
    String[] returnArray = new String[tempArray.length-1];
    System.arraycopy(tempArray, 1, returnArray, 0, returnArray.length);
    return returnArray;
}

E.g., here "a" is the delimiter:

这里的a是分隔符:

splitStringOnDelimiter("-asd-asd-g----10-9asdas jadd", "a", "<>")

You get this:

你得到这个:

1.: -
2.: asd-
3.: asd-g----10-9
4.: asd
5.: as j
6.: add

If you in fact want this:

如果你真的想要:

1.: -a
2.: sd-a
3.: sd-g----10-9a
4.: sda
5.: s ja
6.: dd

You switch:

切换:

safeSequence+delimiter

with

delimiter+safeSequence

#6


-1  

Assuming that you only have a finite set of seperators before the words in your string (eg ;, : etc) you can use the following technique. (apologies for any syntax errors, but its been a while since I used Java)

假设您只有一个有限的seperators集合,然后在字符串(例如:,等等)中使用以下技术。(对任何语法错误表示歉意,但我使用Java已经有一段时间了)

String toSplit = ":alpha;beta:gamma;delta "
toSplit = toSplit.replace(":", "~:")
toSplit = toSplit.replace(";", "~;")
//repeat for all you possible seperators
String[] splitStrings = toSplit.split("~")

#1


26  

str.split("(?=[:;])")

This will give you the desired array, only with an empty first item. And:

这将为您提供所需的数组,只有第一个项为空。和:

str.split("(?=\\b[:;])")

This will give the array without the empty first item.

这将给出没有第一个空项的数组。

  • The key here is the (?=X) which is a zero-width positive lookahead (non-capturing construct) (see regex pattern docs).
  • 这里的关键是(?=X),它是一个零宽度的正前瞻性(非捕获构造)(参见regex模式文档)。
  • [:;] means "either ; or :"
  • (,):意为“;或者:“
  • \b is word-boundary - it's there in order not to consider the first : as delimiter (since it is the beginning of the sequence)
  • \b是单词边界-它的存在是为了不考虑第一个:作为分隔符(因为它是序列的开始)

#2


4  

To keep the separators, you can use a StringTokenizer:

为了保持隔离器,你可以使用StringTokenizer:

new StringTokenizer(":alpha;beta:gamma;delta", ":;", true)

That would yield the separators as tokens.

这将使分隔符成为令牌。

To have them as part of your tokens, you could use String#split with lookahead.

要将它们作为您的标记的一部分,您可以使用带有lookahead的字符串#分隔符。

#3


1  

You can do this by simply using patterns and matcher class in java regx.

您可以通过在java regx中使用模式和matcher类来实现这一点。

    public static String[] mysplit(String text)
    {
     List<String> s = new ArrayList<String>();
     Matcher m = Pattern.compile("(:|;)\\w+").matcher(text);
     while(m.find()) {
   s.add(m.group());
     }
     return s.toArray(new String[s.size()]);
    }

#4


1  

/**
 * @param list an empty String list. used for internal purpose. 
 * @param str  String which has to be processed.
 * @return Splited String Array with delimiters.
 */
public  String[] split(ArrayList<String> list, String str){
  for(int i = str.length()-1 ; i >=0 ; i--){
     if(!Character.isLetterOrDigit((str.charAt(i)))) {
        list.add(str.substring(i, str.length()));
        split(list,str.substring(0,i));
        break;
     }
  }
  return list.toArray(new String[list.size()]);
}

#5


0  

This should work with Java 1.5 (Pattern.quote was introduced in Java 1.5).

这应该与Java 1.5(模式)一起工作。在Java 1.5中引入了quote。

// Split the string on delimiter, but don't delete the delimiter
private String[] splitStringOnDelimiter(String text, String delimiter, String safeSequence){
    // A temporary delimiter must be added as Java split method deletes the delimiter

    // for safeSequence use something that doesn't occur in your texts 
    text=text.replaceAll(Pattern.quote(delimiter), safeSequence+delimiter);
    return text.split(Pattern.quote(safeSequence));
}

If first element is the problem:

如果第一个元素是问题:

private String[] splitStringOnDelimiter(String text, String delimiter, String safeSequence){
    text=text.replaceAll(Pattern.quote(delimiter), safeSequence+delimiter);
    String[] tempArray = text.split(Pattern.quote(safeSequence));
    String[] returnArray = new String[tempArray.length-1];
    System.arraycopy(tempArray, 1, returnArray, 0, returnArray.length);
    return returnArray;
}

E.g., here "a" is the delimiter:

这里的a是分隔符:

splitStringOnDelimiter("-asd-asd-g----10-9asdas jadd", "a", "<>")

You get this:

你得到这个:

1.: -
2.: asd-
3.: asd-g----10-9
4.: asd
5.: as j
6.: add

If you in fact want this:

如果你真的想要:

1.: -a
2.: sd-a
3.: sd-g----10-9a
4.: sda
5.: s ja
6.: dd

You switch:

切换:

safeSequence+delimiter

with

delimiter+safeSequence

#6


-1  

Assuming that you only have a finite set of seperators before the words in your string (eg ;, : etc) you can use the following technique. (apologies for any syntax errors, but its been a while since I used Java)

假设您只有一个有限的seperators集合,然后在字符串(例如:,等等)中使用以下技术。(对任何语法错误表示歉意,但我使用Java已经有一段时间了)

String toSplit = ":alpha;beta:gamma;delta "
toSplit = toSplit.replace(":", "~:")
toSplit = toSplit.replace(";", "~;")
//repeat for all you possible seperators
String[] splitStrings = toSplit.split("~")