Java Regex帮助:在空格上分割字符串,“=>”和逗号

时间:2021-05-31 21:43:54

I need to split a string on any of the following sequences:

我需要在以下任何一个序列上拆分一个字符串:

1 or more spaces
0 or more spaces, followed by a comma, followed by 0 or more spaces,
0 or more spaces, followed by "=>", followed by 0 or more spaces

1个或多个空格0或更多空格,后面跟着逗号,后面跟着0或更多空格,0或更多空格,后面跟着“=>”,后面跟着0或更多空格

Haven't had experience doing Java regexs before, so I'm a little confused. Thanks!

以前没有做Java regexs的经验,所以我有点困惑。谢谢!

Example:
add r10,r12 => r10
store r10 => r1

示例:添加r10,r12 => r10存储r10 => r1

3 个解决方案

#1


27  

Just create regex matching any of your three cases and pass it into split method:

只需创建与您的三种情况之一匹配的regex,并将其传递到分割方法:

string.split("\\s*(=>|,|\\s)\\s*");

Regex here means literally

正则表达式是指字面上

  1. Zero or more whitespaces (\\s*)
  2. 零个或多个白牙(只\s*)
  3. Arrow, or comma, or whitespace (=>|,|\\s)
  4. 箭头、逗号或空格(=>|,|\ s)
  5. Zero or more whitespaces (\\s*)
  6. 零个或多个白牙(只\s*)

You can replace whitespace \\s (detects spaces, tabs, line breaks, etc) with plain space character if necessary.

如果需要的话,您可以用普通空格字符替换空白\s(检测空格、制表符、换行符等)。

#2


13  

Strictly translated

For simplicity, I'm going to interpret you indication of "space" () as "any whitespace" (\s).

为了简单起见,我将把“space”()的指示解释为“any whitespace”(\s)。

Translating your spec more or less "word for word" is to delimit on any of:

翻译你的规范或多或少的“逐字逐句”是对任何一个:

  • 1 or more spaces
    • \s+
    • \ s +
  • 1个或更多的空间\s+
  • 0 or more spaces (\s*), followed by a comma (,), followed by 0 or more spaces (\s*)
    • \s*,\s*
    • \ s * \ s *
  • 0或更多的空格(\s*),后面跟着逗号(,),后面跟着0或更多的空格(\s*) \s*、\s* *
  • 0 or more spaces (\s*), followed by a "=>" (=>), followed by 0 or more spaces (\s*)
    • \s*=>\s*
    • \ s * = > \ s *
  • 0或更多的空间(\s*),然后是“=>”(=>),然后是0或更多的空间(\s*) \s*=>\s*

To match any of the above: (\s+|\s*,\s*|\s*=>\s*)

(\s+|\s*,\s*|\s*=>\s*)

Reduced form

However, your spec can be "reduced" to:

但是,您的规格可以“缩减”为:

  • 0 or more spaces
    • \s*,
    • \ s *,
  • 0或更多空间\s*,
  • followed by either a space, comma, or "=>"
    • (\s|,|=>)
    • (\ s | | = >)
  • 然后是空格、逗号或“=>”(\s|,|=>)
  • followed by 0 or more spaces
    • \s*
    • \ s *
  • 然后是0或更多的空间\s*

Put it all together: \s*(\s|,|=>)\s*

把它们放在一起:\s*(\s|,|=>)\s*

The reduced form gets around some corner cases with the strictly translated form that makes some unexpected empty "matches".

简化后的表单会绕过一些带有严格翻译的表单,从而产生一些意料之外的空“匹配”。

Code

Here's some code:

这里有一些代码:

import java.util.regex.Pattern;

public class Temp {

    // Strictly translated form:
    //private static final String REGEX = "(\\s+|\\s*,\\s*|\\s*=>\\s*)";

    // "Reduced" form:
    private static final String REGEX = "\\s*(\\s|=>|,)\\s*";

    private static final String INPUT =
        "one two,three=>four , five   six   => seven,=>";

    public static void main(final String[] args) {
        final Pattern p = Pattern.compile(REGEX);
        final String[] items = p.split(INPUT);
        // Shorthand for above:
        // final String[] items = INPUT.split(REGEX);
        for(final String s : items) {
            System.out.println("Match: '"+s+"'");
        }
    }
}

Output:

输出:

Match: 'one'
Match: 'two'
Match: 'three'
Match: 'four'
Match: 'five'
Match: 'six'
Match: 'seven'

#3


3  

String[] splitArray = subjectString.split(" *(,|=>| ) *");

should do it.

应该这样做。

#1


27  

Just create regex matching any of your three cases and pass it into split method:

只需创建与您的三种情况之一匹配的regex,并将其传递到分割方法:

string.split("\\s*(=>|,|\\s)\\s*");

Regex here means literally

正则表达式是指字面上

  1. Zero or more whitespaces (\\s*)
  2. 零个或多个白牙(只\s*)
  3. Arrow, or comma, or whitespace (=>|,|\\s)
  4. 箭头、逗号或空格(=>|,|\ s)
  5. Zero or more whitespaces (\\s*)
  6. 零个或多个白牙(只\s*)

You can replace whitespace \\s (detects spaces, tabs, line breaks, etc) with plain space character if necessary.

如果需要的话,您可以用普通空格字符替换空白\s(检测空格、制表符、换行符等)。

#2


13  

Strictly translated

For simplicity, I'm going to interpret you indication of "space" () as "any whitespace" (\s).

为了简单起见,我将把“space”()的指示解释为“any whitespace”(\s)。

Translating your spec more or less "word for word" is to delimit on any of:

翻译你的规范或多或少的“逐字逐句”是对任何一个:

  • 1 or more spaces
    • \s+
    • \ s +
  • 1个或更多的空间\s+
  • 0 or more spaces (\s*), followed by a comma (,), followed by 0 or more spaces (\s*)
    • \s*,\s*
    • \ s * \ s *
  • 0或更多的空格(\s*),后面跟着逗号(,),后面跟着0或更多的空格(\s*) \s*、\s* *
  • 0 or more spaces (\s*), followed by a "=>" (=>), followed by 0 or more spaces (\s*)
    • \s*=>\s*
    • \ s * = > \ s *
  • 0或更多的空间(\s*),然后是“=>”(=>),然后是0或更多的空间(\s*) \s*=>\s*

To match any of the above: (\s+|\s*,\s*|\s*=>\s*)

(\s+|\s*,\s*|\s*=>\s*)

Reduced form

However, your spec can be "reduced" to:

但是,您的规格可以“缩减”为:

  • 0 or more spaces
    • \s*,
    • \ s *,
  • 0或更多空间\s*,
  • followed by either a space, comma, or "=>"
    • (\s|,|=>)
    • (\ s | | = >)
  • 然后是空格、逗号或“=>”(\s|,|=>)
  • followed by 0 or more spaces
    • \s*
    • \ s *
  • 然后是0或更多的空间\s*

Put it all together: \s*(\s|,|=>)\s*

把它们放在一起:\s*(\s|,|=>)\s*

The reduced form gets around some corner cases with the strictly translated form that makes some unexpected empty "matches".

简化后的表单会绕过一些带有严格翻译的表单,从而产生一些意料之外的空“匹配”。

Code

Here's some code:

这里有一些代码:

import java.util.regex.Pattern;

public class Temp {

    // Strictly translated form:
    //private static final String REGEX = "(\\s+|\\s*,\\s*|\\s*=>\\s*)";

    // "Reduced" form:
    private static final String REGEX = "\\s*(\\s|=>|,)\\s*";

    private static final String INPUT =
        "one two,three=>four , five   six   => seven,=>";

    public static void main(final String[] args) {
        final Pattern p = Pattern.compile(REGEX);
        final String[] items = p.split(INPUT);
        // Shorthand for above:
        // final String[] items = INPUT.split(REGEX);
        for(final String s : items) {
            System.out.println("Match: '"+s+"'");
        }
    }
}

Output:

输出:

Match: 'one'
Match: 'two'
Match: 'three'
Match: 'four'
Match: 'five'
Match: 'six'
Match: 'seven'

#3


3  

String[] splitArray = subjectString.split(" *(,|=>| ) *");

should do it.

应该这样做。