正则表达式:从引号中提取所有单词

时间:2022-09-13 09:48:54

By using Regular Expressions how can I extract all text in double quotes, and all words out of quotes in such string:

通过使用正则表达式,如何使用双引号提取所有文本,并使用此字符串中的引号中的所有单词:

01AB "SET 001" IN SET "BACK" 09SS 76 "01 IN" SET

First regular expression should extract all text inside double quotes like

第一个正则表达式应该提取双引号内的所有文本,如

SET 001
BACK
01 IN

Second expression shoud extract all other words in string

第二个表达式shoud提取字符串中的所有其他单词

01AB
IN
SET
09SS
76
SET

For the first case works fine ("(.*?)"). How can I extract all words out of quotes?

对于第一种情况工作正常(“(。*?)”)。如何从引号中提取所有单词?

5 个解决方案

#1


5  

Try this expression:

试试这个表达式:

(?:^|")([^"]*)(?:$|")

The groups matched by it will exclude the quotation marks, because they are enclosed in non-capturing parentheses (?: and ). Of course you need to escape the double-quotes for use in C# code.

与之匹配的组将排除引号,因为它们包含在非捕获括号中(?:和)。当然,您需要转义双引号才能在C#代码中使用。

If the target string starts and/or ends in a quoted value, this expression will match empty groups as well (for the initial and for the trailing quote).

如果目标字符串以引用值开始和/或结尾,则此表达式也将匹配空组(对于初始引用和尾随引用)。

#2


4  

Try this regex:

试试这个正则表达式:

\"[^\"]*\"

Use Regex.Matches for texts in double quotes, and use Regex.Split for all other words:

将Regex.Matches用于双引号中的文本,并将Regex.Split用于所有其他单词:

var strInput = "01AB \"SET 001\" IN SET \"BACK\" 09SS 76 \"01 IN\" SET";
var otherWords = Regex.Split(strInput, "\"[^\"]*\"");

#3


2  

Maybe you can try replacing the words inside quotes with empty string like:

也许你可以尝试用空字符串替换引号内的单词,如:

Regex r = new Regex("\".*?\"", RegexOptions.CultureInvariant | RegexOptions.Compiled | RegexOptions.Singleline);
        string p = "01AB \"SET 001\" IN SET \"BACK\" 09SS 76 \"01 IN\" SET";

        Console.Write(r.Replace(p, "").Replace("  "," "));

#4


1  

You need to negate the pattern in your first expression.

你需要否定第一个表达式中的模式。

(?!pattern)

(?!模式)

Check out this link.

看看这个链接。

#5


1  

If suggest you need all blocks of sentence - quoted and not ones - then there is more simple way to separate source string by using Regex.Split:

如果建议您需要所有句子块 - 引用而不是 - 那么有更简单的方法来使用Regex.Split分隔源字符串:

static Regex QuotedTextRegex = new Regex(@"("".*?"")", RegexOptions.IgnoreCase | RegexOptions.Compiled);

var result = QuotedTextRegex
                .Split(sourceString)
                .Select(v => new
                    {
                        value = v,
                        isQuoted = v.Length > 0 && v[0] == '\"'
                    });

#1


5  

Try this expression:

试试这个表达式:

(?:^|")([^"]*)(?:$|")

The groups matched by it will exclude the quotation marks, because they are enclosed in non-capturing parentheses (?: and ). Of course you need to escape the double-quotes for use in C# code.

与之匹配的组将排除引号,因为它们包含在非捕获括号中(?:和)。当然,您需要转义双引号才能在C#代码中使用。

If the target string starts and/or ends in a quoted value, this expression will match empty groups as well (for the initial and for the trailing quote).

如果目标字符串以引用值开始和/或结尾,则此表达式也将匹配空组(对于初始引用和尾随引用)。

#2


4  

Try this regex:

试试这个正则表达式:

\"[^\"]*\"

Use Regex.Matches for texts in double quotes, and use Regex.Split for all other words:

将Regex.Matches用于双引号中的文本,并将Regex.Split用于所有其他单词:

var strInput = "01AB \"SET 001\" IN SET \"BACK\" 09SS 76 \"01 IN\" SET";
var otherWords = Regex.Split(strInput, "\"[^\"]*\"");

#3


2  

Maybe you can try replacing the words inside quotes with empty string like:

也许你可以尝试用空字符串替换引号内的单词,如:

Regex r = new Regex("\".*?\"", RegexOptions.CultureInvariant | RegexOptions.Compiled | RegexOptions.Singleline);
        string p = "01AB \"SET 001\" IN SET \"BACK\" 09SS 76 \"01 IN\" SET";

        Console.Write(r.Replace(p, "").Replace("  "," "));

#4


1  

You need to negate the pattern in your first expression.

你需要否定第一个表达式中的模式。

(?!pattern)

(?!模式)

Check out this link.

看看这个链接。

#5


1  

If suggest you need all blocks of sentence - quoted and not ones - then there is more simple way to separate source string by using Regex.Split:

如果建议您需要所有句子块 - 引用而不是 - 那么有更简单的方法来使用Regex.Split分隔源字符串:

static Regex QuotedTextRegex = new Regex(@"("".*?"")", RegexOptions.IgnoreCase | RegexOptions.Compiled);

var result = QuotedTextRegex
                .Split(sourceString)
                .Select(v => new
                    {
                        value = v,
                        isQuoted = v.Length > 0 && v[0] == '\"'
                    });