当不被单引号或双引号包围时,使用空格分割字符串

时间:2022-09-15 13:32:33

I'm new to regular expressions and would appreciate your help. I'm trying to put together an expression that will split the example string using all spaces that are not surrounded by single or double quotes. My last attempt looks like this: (?!") and isn't quite working. It's splitting on the space before the quote.

我对正则表达式不熟悉,非常感谢您的帮助。我尝试将一个表达式组合在一起,它将使用没有被单引号或双引号包围的所有空格来分割示例字符串。我的最后一次尝试是这样的:(?!)它在引用前的空间上分裂。

Example input:

示例输入:

This is a string that "will be" highlighted when your 'regular expression' matches something.

Desired output:

期望的输出:

This
is
a
string
that
will be
highlighted
when
your
regular expression
matches
something.

Note that "will be" and 'regular expression' retain the space between the words.

注意“will”和“regular expression”保留单词之间的空格。

13 个解决方案

#1


209  

I don't understand why all the others are proposing such complex regular expressions or such long code. Essentially, you want to grab two kinds of things from your string: sequences of characters that aren't spaces or quotes, and sequences of characters that begin and end with a quote, with no quotes in between, for two kinds of quotes. You can easily match those things with this regular expression:

我不明白为什么其他人都提出如此复杂的正则表达式或如此长的代码。本质上,您希望从您的字符串中获取两种东西:不是空格或引号的字符序列,以及以引号开头和结尾的字符序列(中间没有引号),对于两种引号。你可以很容易地将这些东西与这个正则表达式匹配:

[^\s"']+|"([^"]*)"|'([^']*)'

I added the capturing groups because you don't want the quotes in the list.

我添加了捕获组,因为您不想要列表中的引号。

This Java code builds the list, adding the capturing group if it matched to exclude the quotes, and adding the overall regex match if the capturing group didn't match (an unquoted word was matched).

此Java代码构建列表,如果匹配捕获组以排除引号,则添加捕获组;如果捕获组不匹配,则添加整个regex匹配(匹配未引用的单词)。

List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("[^\\s\"']+|\"([^\"]*)\"|'([^']*)'");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
    if (regexMatcher.group(1) != null) {
        // Add double-quoted string without the quotes
        matchList.add(regexMatcher.group(1));
    } else if (regexMatcher.group(2) != null) {
        // Add single-quoted string without the quotes
        matchList.add(regexMatcher.group(2));
    } else {
        // Add unquoted word
        matchList.add(regexMatcher.group());
    }
} 

If you don't mind having the quotes in the returned list, you can use much simpler code:

如果您不介意返回列表中的引号,您可以使用更简单的代码:

List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("[^\\s\"']+|\"[^\"]*\"|'[^']*'");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
    matchList.add(regexMatcher.group());
} 

#2


12  

There are several questions on * that cover this same question in various contexts using regular expressions. For instance:

在*上有几个问题,它们使用正则表达式在各种上下文中覆盖了相同的问题。例如:

UPDATE: Sample regex to handle single and double quoted strings. Ref: How can I split on a string except when inside quotes?

更新:示例regex处理单引号和双引号字符串。Ref:除了里面的引号外,我如何在字符串上分割?

m/('.*?'|".*?"|\S+)/g 

Tested this with a quick Perl snippet and the output was as reproduced below. Also works for empty strings or whitespace-only strings if they are between quotes (not sure if that's desired or not).

使用一个快速的Perl代码片段对其进行了测试,输出结果如下所示。如果字符串是在引号之间(不确定是否需要),也可以用于空字符串或只使用白色字符串。

This
is
a
string
that
"will be"
highlighted
when
your
'regular expression'
matches
something.

Note that this does include the quote characters themselves in the matched values, though you can remove that with a string replace, or modify the regex to not include them. I'll leave that as an exercise for the reader or another poster for now, as 2am is way too late to be messing with regular expressions anymore ;)

请注意,这确实包括了匹配值中的引号字符本身,但是您可以使用字符串替换来删除它,或者修改regex以不包含它们。我把它留给读者或者其他的海报作为练习,因为凌晨2点已经太迟了,不能再用正则表达式了;)

#3


5  

If you want to allow escaped quotes inside the string, you can use something like this:

如果要在字符串中允许转义引号,可以使用以下内容:

(?:(['"])(.*?)(?<!\\)(?>\\\\)*\1|([^\s]+))

Quoted strings will be group 2, single unquoted words will be group 3.

引用的字符串将是组2,单个未引用的单词将是组3。

You can try it on various strings here: http://www.fileformat.info/tool/regex.htm or http://gskinner.com/RegExr/

您可以在这里的各种字符串上尝试:http://www.fileformat.info/tool/regex.htm或http://gskinner.com/RegExr/

#4


3  

The regex from Jan Goyvaerts is the best solution I found so far, but creates also empty (null) matches, which he excludes in his program. These empty matches also appear from regex testers (e.g. rubular.com). If you turn the searches arround (first look for the quoted parts and than the space separed words) then you might do it in once with:

Jan Goyvaerts提供的regex是我到目前为止找到的最好的解决方案,但也创建空(空)匹配,并在程序中排除这些匹配。这些空匹配也会出现在regex测试器中(例如rubular.com)。如果你把搜索转过来(首先查找引用的部分,而不是空格分隔的单词),那么你可以一次完成:

("[^"]*"|'[^']*'|[\S]+)+

#5


2  

(?<!\G".{0,99999})\s|(?<=\G".{0,99999}")\s

This will match the spaces not surrounded by double quotes. I have to use min,max {0,99999} because Java doesn't support * and + in lookbehind.

这将匹配不被双引号包围的空格。我必须使用min、max{0,99999},因为Java在lookbehind中不支持*和+。

#6


1  

It'll probably be easier to search the string, grabbing each part, vs. split it.

它可能更容易搜索字符串,抓取每个部分,和分割它。

Reason being, you can have it split at the spaces before and after "will be". But, I can't think of any way to specify ignoring the space between inside a split.

理由是,你可以让它在“将要”之前和之后的空格中分开。但是,我想不出任何方法来指定忽略分割内部的空间。

(not actual Java)

(而不是实际的Java)

string = "This is a string that \"will be\" highlighted when your 'regular expression' matches something.";

regex = "\"(\\\"|(?!\\\").)+\"|[^ ]+"; // search for a quoted or non-spaced group
final = new Array();

while (string.length > 0) {
    string = string.trim();
    if (Regex(regex).test(string)) {
        final.push(Regex(regex).match(string)[0]);
        string = string.replace(regex, ""); // progress to next "word"
    }
}

Also, capturing single quotes could lead to issues:

另外,捕捉单引号可能会导致以下问题:

"Foo's Bar 'n Grill"

//=>

"Foo"
"s Bar "
"n"
"Grill"

#7


1  

String.split() is not helpful here because there is no way to distinguish between spaces within quotes (don't split) and those outside (split). Matcher.lookingAt() is probably what you need:

String.split()在这里没有帮助,因为无法区分引号中的空格(不要分割)和引号外的空格(分割)。Matcher.lookingAt()可能是你需要的:

String str = "This is a string that \"will be\" highlighted when your 'regular expression' matches something.";
str = str + " "; // add trailing space
int len = str.length();
Matcher m = Pattern.compile("((\"[^\"]+?\")|('[^']+?')|([^\\s]+?))\\s++").matcher(str);

for (int i = 0; i < len; i++)
{
    m.region(i, len);

    if (m.lookingAt())
    {
        String s = m.group(1);

        if ((s.startsWith("\"") && s.endsWith("\"")) ||
            (s.startsWith("'") && s.endsWith("'")))
        {
            s = s.substring(1, s.length() - 1);
        }

        System.out.println(i + ": \"" + s + "\"");
        i += (m.group(0).length() - 1);
    }
}

which produces the following output:

产生以下输出:

0: "This"
5: "is"
8: "a"
10: "string"
17: "that"
22: "will be"
32: "highlighted"
44: "when"
49: "your"
54: "regular expression"
75: "matches"
83: "something."

#8


1  

I liked Marcus's approach, however, I modified it so that I could allow text near the quotes, and support both " and ' quote characters. For example, I needed a="some value" to not split it into [a=, "some value"].

我喜欢Marcus的方法,但是,我修改了它,以便允许在引号附近的文本,并同时支持“引用字符”和“引用字符”。例如,我需要a="some value"才能不将它分割成[a= "some value"]。

(?<!\\G\\S{0,99999}[\"'].{0,99999})\\s|(?<=\\G\\S{0,99999}\".{0,99999}\"\\S{0,99999})\\s|(?<=\\G\\S{0,99999}'.{0,99999}'\\S{0,99999})\\s"

#9


1  

A couple hopefully helpful tweaks on Jan's accepted answer:

对简已经接受的答案进行了一些有益的调整:

(['"])((?:\\\1|.)+?)\1|([^\s"']+)
  • Allows escaped quotes within quoted strings
  • 允许引用字符串中的转义引号。
  • Avoids repeating the pattern for the single and double quote; this also simplifies adding more quoting symbols if needed (at the expense of one more capturing group)
  • 避免重复单引号和双引号的模式;这还简化了在需要时添加更多引用符号(以牺牲一个捕获组为代价)

#10


1  

Jan's approach is great but here's another one for the record.

简的方法很棒,但这是另一个记录。

If you actually wanted to split as mentioned in the title, keeping the quotes in "will be" and 'regular expression', then you could use this method which is straight out of Match (or replace) a pattern except in situations s1, s2, s3 etc

如果您确实想要像标题中提到的那样拆分,将引号保存在“will”和“regular expression”中,那么您可以使用这种方法,这种方法直接不匹配(或替换)模式,除非在s1、s2、s3等情况下

The regex:

正则表达式:

'[^']*'|\"[^\"]*\"|( )

The two left alternations match complete 'quoted strings' and "double-quoted strings". We will ignore these matches. The right side matches and captures spaces to Group 1, and we know they are the right spaces because they were not matched by the expressions on the left. We replace those with SplitHere then split on SplitHere. Again, this is for a true split case where you want "will be", not will be.

这两个左变匹配完整的“引号字符串”和“双引号字符串”。我们将忽略这些匹配。右边匹配并捕获组1的空间,我们知道它们是正确的空间,因为它们没有被左边的表达式匹配。我们用SplitHere替换它们,然后在SplitHere上分割。同样,这是一个真正的分割情况,你想要“将是”,而不是将会是。

Here is a full working implementation (see the results on the online demo).

这里有一个完整的工作实现(请参阅在线演示的结果)。

import java.util.*;
import java.io.*;
import java.util.regex.*;
import java.util.List;

class Program {
public static void main (String[] args) throws java.lang.Exception  {

String subject = "This is a string that \"will be\" highlighted when your 'regular expression' matches something.";
Pattern regex = Pattern.compile("\'[^']*'|\"[^\"]*\"|( )");
Matcher m = regex.matcher(subject);
StringBuffer b= new StringBuffer();
while (m.find()) {
    if(m.group(1) != null) m.appendReplacement(b, "SplitHere");
    else m.appendReplacement(b, m.group(0));
}
m.appendTail(b);
String replaced = b.toString();
String[] splits = replaced.split("SplitHere");
for (String split : splits) System.out.println(split);
} // end main
} // end Program

#11


0  

I'm reasonably certain this is not possible using regular expressions alone. Checking whether something is contained inside some other tag is a parsing operation. This seems like the same problem as trying to parse XML with a regex -- it can't be done correctly. You may be able to get your desired outcome by repeatedly applying a non-greedy, non-global regex that matches the quoted strings, then once you can't find anything else, split it at the spaces... that has a number of problems, including keeping track of the original order of all the substrings. Your best bet is to just write a really simple function that iterates over the string and pulls out the tokens you want.

我相当肯定,仅使用正则表达式是不可能实现这一点的。检查是否包含在其他标记内的内容是一个解析操作。这似乎与试图用regex解析XML一样——它不能正确地执行。您可以通过重复应用与引用字符串匹配的非贪婪的、非全局的regex来获得所需的结果,然后一旦您找不到其他内容,就在空格处分割它……这有很多问题,包括跟踪所有子字符串的原始顺序。最好的方法是编写一个非常简单的函数来遍历字符串并提取所需的令牌。

#12


0  

You can also try this:

你也可以试试这个:

    String str = "This is a string that \"will be\" highlighted when your 'regular expression' matches something";
    String ss[] = str.split("\"|\'");
    for (int i = 0; i < ss.length; i++) {
        if ((i % 2) == 0) {//even
            String[] part1 = ss[i].split(" ");
            for (String pp1 : part1) {
                System.out.println("" + pp1);
            }
        } else {//odd
            System.out.println("" + ss[i]);
        }
    }

#13


0  

If you are using c#, you can use

如果您正在使用c#,您可以使用

string input= "This is a string that \"will be\" highlighted when your 'regular expression' matches <something random>";

List<string> list1 = 
                Regex.Matches(input, @"(?<match>\w+)|\""(?<match>[\w\s]*)""|'(?<match>[\w\s]*)'|<(?<match>[\w\s]*)>").Cast<Match>().Select(m => m.Groups["match"].Value).ToList();

foreach(var v in list1)
   Console.WriteLine(v);

I have specifically added "|<(?[\w\s]*)>" to highlight that you can specify any char to group phrases. (In this case I am using < > to group.

我特别添加了“|<(?[\w\s]*)>”,以突出您可以指定任何字符到组短语。(在本例中,我使用< >对组进行分组。

Output is :

输出是:

This
is
a
string
that
will be
highlighted
when
your
regular expression 
matches
something random

#1


209  

I don't understand why all the others are proposing such complex regular expressions or such long code. Essentially, you want to grab two kinds of things from your string: sequences of characters that aren't spaces or quotes, and sequences of characters that begin and end with a quote, with no quotes in between, for two kinds of quotes. You can easily match those things with this regular expression:

我不明白为什么其他人都提出如此复杂的正则表达式或如此长的代码。本质上,您希望从您的字符串中获取两种东西:不是空格或引号的字符序列,以及以引号开头和结尾的字符序列(中间没有引号),对于两种引号。你可以很容易地将这些东西与这个正则表达式匹配:

[^\s"']+|"([^"]*)"|'([^']*)'

I added the capturing groups because you don't want the quotes in the list.

我添加了捕获组,因为您不想要列表中的引号。

This Java code builds the list, adding the capturing group if it matched to exclude the quotes, and adding the overall regex match if the capturing group didn't match (an unquoted word was matched).

此Java代码构建列表,如果匹配捕获组以排除引号,则添加捕获组;如果捕获组不匹配,则添加整个regex匹配(匹配未引用的单词)。

List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("[^\\s\"']+|\"([^\"]*)\"|'([^']*)'");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
    if (regexMatcher.group(1) != null) {
        // Add double-quoted string without the quotes
        matchList.add(regexMatcher.group(1));
    } else if (regexMatcher.group(2) != null) {
        // Add single-quoted string without the quotes
        matchList.add(regexMatcher.group(2));
    } else {
        // Add unquoted word
        matchList.add(regexMatcher.group());
    }
} 

If you don't mind having the quotes in the returned list, you can use much simpler code:

如果您不介意返回列表中的引号,您可以使用更简单的代码:

List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("[^\\s\"']+|\"[^\"]*\"|'[^']*'");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
    matchList.add(regexMatcher.group());
} 

#2


12  

There are several questions on * that cover this same question in various contexts using regular expressions. For instance:

在*上有几个问题,它们使用正则表达式在各种上下文中覆盖了相同的问题。例如:

UPDATE: Sample regex to handle single and double quoted strings. Ref: How can I split on a string except when inside quotes?

更新:示例regex处理单引号和双引号字符串。Ref:除了里面的引号外,我如何在字符串上分割?

m/('.*?'|".*?"|\S+)/g 

Tested this with a quick Perl snippet and the output was as reproduced below. Also works for empty strings or whitespace-only strings if they are between quotes (not sure if that's desired or not).

使用一个快速的Perl代码片段对其进行了测试,输出结果如下所示。如果字符串是在引号之间(不确定是否需要),也可以用于空字符串或只使用白色字符串。

This
is
a
string
that
"will be"
highlighted
when
your
'regular expression'
matches
something.

Note that this does include the quote characters themselves in the matched values, though you can remove that with a string replace, or modify the regex to not include them. I'll leave that as an exercise for the reader or another poster for now, as 2am is way too late to be messing with regular expressions anymore ;)

请注意,这确实包括了匹配值中的引号字符本身,但是您可以使用字符串替换来删除它,或者修改regex以不包含它们。我把它留给读者或者其他的海报作为练习,因为凌晨2点已经太迟了,不能再用正则表达式了;)

#3


5  

If you want to allow escaped quotes inside the string, you can use something like this:

如果要在字符串中允许转义引号,可以使用以下内容:

(?:(['"])(.*?)(?<!\\)(?>\\\\)*\1|([^\s]+))

Quoted strings will be group 2, single unquoted words will be group 3.

引用的字符串将是组2,单个未引用的单词将是组3。

You can try it on various strings here: http://www.fileformat.info/tool/regex.htm or http://gskinner.com/RegExr/

您可以在这里的各种字符串上尝试:http://www.fileformat.info/tool/regex.htm或http://gskinner.com/RegExr/

#4


3  

The regex from Jan Goyvaerts is the best solution I found so far, but creates also empty (null) matches, which he excludes in his program. These empty matches also appear from regex testers (e.g. rubular.com). If you turn the searches arround (first look for the quoted parts and than the space separed words) then you might do it in once with:

Jan Goyvaerts提供的regex是我到目前为止找到的最好的解决方案,但也创建空(空)匹配,并在程序中排除这些匹配。这些空匹配也会出现在regex测试器中(例如rubular.com)。如果你把搜索转过来(首先查找引用的部分,而不是空格分隔的单词),那么你可以一次完成:

("[^"]*"|'[^']*'|[\S]+)+

#5


2  

(?<!\G".{0,99999})\s|(?<=\G".{0,99999}")\s

This will match the spaces not surrounded by double quotes. I have to use min,max {0,99999} because Java doesn't support * and + in lookbehind.

这将匹配不被双引号包围的空格。我必须使用min、max{0,99999},因为Java在lookbehind中不支持*和+。

#6


1  

It'll probably be easier to search the string, grabbing each part, vs. split it.

它可能更容易搜索字符串,抓取每个部分,和分割它。

Reason being, you can have it split at the spaces before and after "will be". But, I can't think of any way to specify ignoring the space between inside a split.

理由是,你可以让它在“将要”之前和之后的空格中分开。但是,我想不出任何方法来指定忽略分割内部的空间。

(not actual Java)

(而不是实际的Java)

string = "This is a string that \"will be\" highlighted when your 'regular expression' matches something.";

regex = "\"(\\\"|(?!\\\").)+\"|[^ ]+"; // search for a quoted or non-spaced group
final = new Array();

while (string.length > 0) {
    string = string.trim();
    if (Regex(regex).test(string)) {
        final.push(Regex(regex).match(string)[0]);
        string = string.replace(regex, ""); // progress to next "word"
    }
}

Also, capturing single quotes could lead to issues:

另外,捕捉单引号可能会导致以下问题:

"Foo's Bar 'n Grill"

//=>

"Foo"
"s Bar "
"n"
"Grill"

#7


1  

String.split() is not helpful here because there is no way to distinguish between spaces within quotes (don't split) and those outside (split). Matcher.lookingAt() is probably what you need:

String.split()在这里没有帮助,因为无法区分引号中的空格(不要分割)和引号外的空格(分割)。Matcher.lookingAt()可能是你需要的:

String str = "This is a string that \"will be\" highlighted when your 'regular expression' matches something.";
str = str + " "; // add trailing space
int len = str.length();
Matcher m = Pattern.compile("((\"[^\"]+?\")|('[^']+?')|([^\\s]+?))\\s++").matcher(str);

for (int i = 0; i < len; i++)
{
    m.region(i, len);

    if (m.lookingAt())
    {
        String s = m.group(1);

        if ((s.startsWith("\"") && s.endsWith("\"")) ||
            (s.startsWith("'") && s.endsWith("'")))
        {
            s = s.substring(1, s.length() - 1);
        }

        System.out.println(i + ": \"" + s + "\"");
        i += (m.group(0).length() - 1);
    }
}

which produces the following output:

产生以下输出:

0: "This"
5: "is"
8: "a"
10: "string"
17: "that"
22: "will be"
32: "highlighted"
44: "when"
49: "your"
54: "regular expression"
75: "matches"
83: "something."

#8


1  

I liked Marcus's approach, however, I modified it so that I could allow text near the quotes, and support both " and ' quote characters. For example, I needed a="some value" to not split it into [a=, "some value"].

我喜欢Marcus的方法,但是,我修改了它,以便允许在引号附近的文本,并同时支持“引用字符”和“引用字符”。例如,我需要a="some value"才能不将它分割成[a= "some value"]。

(?<!\\G\\S{0,99999}[\"'].{0,99999})\\s|(?<=\\G\\S{0,99999}\".{0,99999}\"\\S{0,99999})\\s|(?<=\\G\\S{0,99999}'.{0,99999}'\\S{0,99999})\\s"

#9


1  

A couple hopefully helpful tweaks on Jan's accepted answer:

对简已经接受的答案进行了一些有益的调整:

(['"])((?:\\\1|.)+?)\1|([^\s"']+)
  • Allows escaped quotes within quoted strings
  • 允许引用字符串中的转义引号。
  • Avoids repeating the pattern for the single and double quote; this also simplifies adding more quoting symbols if needed (at the expense of one more capturing group)
  • 避免重复单引号和双引号的模式;这还简化了在需要时添加更多引用符号(以牺牲一个捕获组为代价)

#10


1  

Jan's approach is great but here's another one for the record.

简的方法很棒,但这是另一个记录。

If you actually wanted to split as mentioned in the title, keeping the quotes in "will be" and 'regular expression', then you could use this method which is straight out of Match (or replace) a pattern except in situations s1, s2, s3 etc

如果您确实想要像标题中提到的那样拆分,将引号保存在“will”和“regular expression”中,那么您可以使用这种方法,这种方法直接不匹配(或替换)模式,除非在s1、s2、s3等情况下

The regex:

正则表达式:

'[^']*'|\"[^\"]*\"|( )

The two left alternations match complete 'quoted strings' and "double-quoted strings". We will ignore these matches. The right side matches and captures spaces to Group 1, and we know they are the right spaces because they were not matched by the expressions on the left. We replace those with SplitHere then split on SplitHere. Again, this is for a true split case where you want "will be", not will be.

这两个左变匹配完整的“引号字符串”和“双引号字符串”。我们将忽略这些匹配。右边匹配并捕获组1的空间,我们知道它们是正确的空间,因为它们没有被左边的表达式匹配。我们用SplitHere替换它们,然后在SplitHere上分割。同样,这是一个真正的分割情况,你想要“将是”,而不是将会是。

Here is a full working implementation (see the results on the online demo).

这里有一个完整的工作实现(请参阅在线演示的结果)。

import java.util.*;
import java.io.*;
import java.util.regex.*;
import java.util.List;

class Program {
public static void main (String[] args) throws java.lang.Exception  {

String subject = "This is a string that \"will be\" highlighted when your 'regular expression' matches something.";
Pattern regex = Pattern.compile("\'[^']*'|\"[^\"]*\"|( )");
Matcher m = regex.matcher(subject);
StringBuffer b= new StringBuffer();
while (m.find()) {
    if(m.group(1) != null) m.appendReplacement(b, "SplitHere");
    else m.appendReplacement(b, m.group(0));
}
m.appendTail(b);
String replaced = b.toString();
String[] splits = replaced.split("SplitHere");
for (String split : splits) System.out.println(split);
} // end main
} // end Program

#11


0  

I'm reasonably certain this is not possible using regular expressions alone. Checking whether something is contained inside some other tag is a parsing operation. This seems like the same problem as trying to parse XML with a regex -- it can't be done correctly. You may be able to get your desired outcome by repeatedly applying a non-greedy, non-global regex that matches the quoted strings, then once you can't find anything else, split it at the spaces... that has a number of problems, including keeping track of the original order of all the substrings. Your best bet is to just write a really simple function that iterates over the string and pulls out the tokens you want.

我相当肯定,仅使用正则表达式是不可能实现这一点的。检查是否包含在其他标记内的内容是一个解析操作。这似乎与试图用regex解析XML一样——它不能正确地执行。您可以通过重复应用与引用字符串匹配的非贪婪的、非全局的regex来获得所需的结果,然后一旦您找不到其他内容,就在空格处分割它……这有很多问题,包括跟踪所有子字符串的原始顺序。最好的方法是编写一个非常简单的函数来遍历字符串并提取所需的令牌。

#12


0  

You can also try this:

你也可以试试这个:

    String str = "This is a string that \"will be\" highlighted when your 'regular expression' matches something";
    String ss[] = str.split("\"|\'");
    for (int i = 0; i < ss.length; i++) {
        if ((i % 2) == 0) {//even
            String[] part1 = ss[i].split(" ");
            for (String pp1 : part1) {
                System.out.println("" + pp1);
            }
        } else {//odd
            System.out.println("" + ss[i]);
        }
    }

#13


0  

If you are using c#, you can use

如果您正在使用c#,您可以使用

string input= "This is a string that \"will be\" highlighted when your 'regular expression' matches <something random>";

List<string> list1 = 
                Regex.Matches(input, @"(?<match>\w+)|\""(?<match>[\w\s]*)""|'(?<match>[\w\s]*)'|<(?<match>[\w\s]*)>").Cast<Match>().Select(m => m.Groups["match"].Value).ToList();

foreach(var v in list1)
   Console.WriteLine(v);

I have specifically added "|<(?[\w\s]*)>" to highlight that you can specify any char to group phrases. (In this case I am using < > to group.

我特别添加了“|<(?[\w\s]*)>”,以突出您可以指定任何字符到组短语。(在本例中,我使用< >对组进行分组。

Output is :

输出是:

This
is
a
string
that
will be
highlighted
when
your
regular expression 
matches
something random