字符串拆分多个分隔符java [复制]

时间:2023-02-14 18:13:40

This question already has an answer here:

这个问题在这里已有答案:

I am splitting below string with multiple delimiters. Delimiters are:

我在具有多个分隔符的字符串下面分裂。分隔符是:

, . @ ? ! _ ' and white space etc. 

Below is my code:

以下是我的代码:

String[] tokens = s.split("[!|?|,|.|_|'|@ |\\s]");

For input:

He is a very very good boy, isn't he?

他是一个非常好的男孩,不是吗?

Expected output after split is: 10 tokens

拆分后的预期输出为:10个令牌

He
is
a
very
very
good
boy
isn
t
he

他是个非常好的男孩

But I am getting below ouput: 11 tokens

但是我得到的输出低于11:令牌

He
is
a
very
very
good
boy

他是一个非常好的男孩

isn
t
he

不是吗

Because two delimiters whitespace and comma are adjacent, it is giving 11 tokens. How to get expected output?

因为两个分隔符的空格和逗号是相邻的,所以它给出了11个令牌。如何获得预期的产量?

2 个解决方案

#1


3  

You can use + for finding the combination, if you want to avoid multiple consecutive delimiters which results in empty string

如果要避免多个连续的分隔符导致空字符串,可以使用+来查找组合

s.split("[,.@?!_'\\s]+")

NOTE :- As I mentioned in comment, character class itself works as OR condition for characters. So, there is no need of using | inside character class for achieving alternation, because it will match | literally.

注意: - 正如我在评论中提到的,字符类本身作为字符的OR条件。所以,没有必要使用|用于实现交替的内部字符类,因为它将匹配|从字面上。

#2


3  

To match more than one consecutive delimiter, use the +:

要匹配多个连续分隔符,请使用+:

s.split("[,.@?!_'\\s]+");

Another regex that you should consider using is:

您应该考虑使用的另一个正则表达式是:

s.split("[\\W_]+");

This will split so that any non-word character will be treated as a delimiter. This is not specified by your question, but it has the output you expect as well.

这将拆分,以便任何非单词字符将被视为分隔符。您的问题没有指定,但它也有您期望的输出。

#1


3  

You can use + for finding the combination, if you want to avoid multiple consecutive delimiters which results in empty string

如果要避免多个连续的分隔符导致空字符串,可以使用+来查找组合

s.split("[,.@?!_'\\s]+")

NOTE :- As I mentioned in comment, character class itself works as OR condition for characters. So, there is no need of using | inside character class for achieving alternation, because it will match | literally.

注意: - 正如我在评论中提到的,字符类本身作为字符的OR条件。所以,没有必要使用|用于实现交替的内部字符类,因为它将匹配|从字面上。

#2


3  

To match more than one consecutive delimiter, use the +:

要匹配多个连续分隔符,请使用+:

s.split("[,.@?!_'\\s]+");

Another regex that you should consider using is:

您应该考虑使用的另一个正则表达式是:

s.split("[\\W_]+");

This will split so that any non-word character will be treated as a delimiter. This is not specified by your question, but it has the output you expect as well.

这将拆分,以便任何非单词字符将被视为分隔符。您的问题没有指定,但它也有您期望的输出。