有没有办法在两个双引号内匹配双引号?

时间:2022-09-15 16:18:49

I tried the following regex, but it matches all the double quotes:

我尝试了以下正则表达式,但它匹配所有双引号:

(?>(?<=(")|))"(?(1)(?!"))

Here is a sample of the text:

以下是文字示例:

"[\"my cars last night\",
\"Burger\",\"Decaf\" shirt\",
\"Mocha\",\"marshmallows\",
\"Coffee Mission\"]"

The pattern I want to match is the double quote between the double quotes in line 2

我想匹配的模式是第2行中双引号之间的双引号

2 个解决方案

#1


4  

As a general rule, I would say: no.

作为一般规则,我会说:不。

Given a string:

给定一个字符串:

\"Burger\" \"Decaf\" shirt\"

How do you decide which \" is superfluous (non-matching)? Is this one after Burger, one after Decaf or one after shirt? Or one before any of these words? I believe the choice is arbitrary.

你怎么决定哪个“多余(不匹配)?这个是汉堡之后,一个是在Decaf之后还是一个在衬衫之后?或者在这些之前的一个?我相信选择是任意的。

Although in your particular example it seems that you want all \" that are not adjacent to comma.

虽然在您的特定示例中,您似乎希望所有“不与逗号相邻”。

These can be found by following regexp:

这些可以通过以下正则表达式找到:

(?<!^)(?<![,\[])\\"(?![,\]])

We start with \\" (backslash followed by double quote) in the center.

我们从中心的\\“(反斜杠后跟双引号)开始。

Then we use negative lookahead to discard all matches that are followed by comma or closing square bracket.

然后我们使用否定前瞻来丢弃所有匹配,后面跟着逗号或关闭方括号。

Then we use negative lookbehind to discard all matches that happen after comma or opening bracket.

然后我们使用负向lookbehind来丢弃在逗号或开括号之后发生的所有匹配。

Regexp engine that I have used can't cope with alternation inside lookaround statements. To work around it, I take advantage of the fact that lookarounds are zero-length matches and I prepend negative lookbehind that matches beginning of line at the beginning of expression.

我使用的Regexp引擎无法应对外观声明中的交替。为了解决这个问题,我利用了lookarounds是零长度匹配的事实,并且我预先设置负面的lookbehind,它匹配表达式开头的行首。

Proof (in perl):

证明(在perl中):

$ cat test
"[\"my cars last night\",
\"Burger\",\"Decaf\" shirt\",
\"Mocha\",\"marshmallows\",
\"Coffee Mission\"]"
$ perl -n -e '$_ =~ s/(?<!^)(?<![,\[])\\"(?![,\]])/|||/g; print $_' test
"[\"my cars last night\",
\"Burger\",\"Decaf||| shirt\",
\"Mocha\",\"marshmallows\",
\"Coffee Mission\"]"

#2


1  

Let's assume that the format of your string must be like this:

我们假设你的字符串的格式必须是这样的:

["item1", "item2", ... "itemN"]

The way to know if a double quote is a closing double quote is to check if it is followed by a comma or a closing square bracket. To find a double quote enclosed by double quotes, you must match all well formatted items from the beginning until an unexpected quote.

知道双引号是否为结束双引号的方法是检查它后面是逗号还是结束方括号。要查找用双引号括起来的双引号,您必须匹配从开头到所有格式良好的项目,直到出现意外报价。

Example to find the first enclosed quote (if it exists):

查找第一个封闭引号的示例(如果存在):

(?:"[^"]*",\s*)*+"[^"]*\K"

demo

演示

But this works only for one enclosed quote in all the string and isn't useful if you want to find all of them.

但是这只适用于所有字符串中的一个封闭引号,如果要查找所有字符串,则无效。

to find all quotes:

找到所有报价:

(?:\G(?!\A)|(?:\A[^"]*|[^"]*",\s*)(?:"[^"]*",\s*)*+")[^"]*\K"(?!\s*[\],])

demo

演示

#1


4  

As a general rule, I would say: no.

作为一般规则,我会说:不。

Given a string:

给定一个字符串:

\"Burger\" \"Decaf\" shirt\"

How do you decide which \" is superfluous (non-matching)? Is this one after Burger, one after Decaf or one after shirt? Or one before any of these words? I believe the choice is arbitrary.

你怎么决定哪个“多余(不匹配)?这个是汉堡之后,一个是在Decaf之后还是一个在衬衫之后?或者在这些之前的一个?我相信选择是任意的。

Although in your particular example it seems that you want all \" that are not adjacent to comma.

虽然在您的特定示例中,您似乎希望所有“不与逗号相邻”。

These can be found by following regexp:

这些可以通过以下正则表达式找到:

(?<!^)(?<![,\[])\\"(?![,\]])

We start with \\" (backslash followed by double quote) in the center.

我们从中心的\\“(反斜杠后跟双引号)开始。

Then we use negative lookahead to discard all matches that are followed by comma or closing square bracket.

然后我们使用否定前瞻来丢弃所有匹配,后面跟着逗号或关闭方括号。

Then we use negative lookbehind to discard all matches that happen after comma or opening bracket.

然后我们使用负向lookbehind来丢弃在逗号或开括号之后发生的所有匹配。

Regexp engine that I have used can't cope with alternation inside lookaround statements. To work around it, I take advantage of the fact that lookarounds are zero-length matches and I prepend negative lookbehind that matches beginning of line at the beginning of expression.

我使用的Regexp引擎无法应对外观声明中的交替。为了解决这个问题,我利用了lookarounds是零长度匹配的事实,并且我预先设置负面的lookbehind,它匹配表达式开头的行首。

Proof (in perl):

证明(在perl中):

$ cat test
"[\"my cars last night\",
\"Burger\",\"Decaf\" shirt\",
\"Mocha\",\"marshmallows\",
\"Coffee Mission\"]"
$ perl -n -e '$_ =~ s/(?<!^)(?<![,\[])\\"(?![,\]])/|||/g; print $_' test
"[\"my cars last night\",
\"Burger\",\"Decaf||| shirt\",
\"Mocha\",\"marshmallows\",
\"Coffee Mission\"]"

#2


1  

Let's assume that the format of your string must be like this:

我们假设你的字符串的格式必须是这样的:

["item1", "item2", ... "itemN"]

The way to know if a double quote is a closing double quote is to check if it is followed by a comma or a closing square bracket. To find a double quote enclosed by double quotes, you must match all well formatted items from the beginning until an unexpected quote.

知道双引号是否为结束双引号的方法是检查它后面是逗号还是结束方括号。要查找用双引号括起来的双引号,您必须匹配从开头到所有格式良好的项目,直到出现意外报价。

Example to find the first enclosed quote (if it exists):

查找第一个封闭引号的示例(如果存在):

(?:"[^"]*",\s*)*+"[^"]*\K"

demo

演示

But this works only for one enclosed quote in all the string and isn't useful if you want to find all of them.

但是这只适用于所有字符串中的一个封闭引号,如果要查找所有字符串,则无效。

to find all quotes:

找到所有报价:

(?:\G(?!\A)|(?:\A[^"]*|[^"]*",\s*)(?:"[^"]*",\s*)*+")[^"]*\K"(?!\s*[\],])

demo

演示