python:正则表达式提取两个文本之间的内容

时间:2022-09-13 16:23:44

I want a python regex expression that can pull the contents between script[" and "] but there are other "]" which worries me

我想要一个python正则表达式,可以在脚本[“和”]之间拉取内容,但还有其他“]”让我担心

expected: {bunch of javascript here. [\"apple\"] test}

期待:{一堆javascript在这里。 [\“apple \”]测试}

my attempt:

我的尝试:

javascript\[\"(.*)"]

的JavaScript \ [\ “(。*)”]

target string:

目标字符串:

//url//script["{bunch of javascript here. [\"apple\"] test}"]|//*[@attribute="eggs"]

// url // script [“{一串javascript here。[\”apple \“] test}”] | // * [@ attribute =“eggs”]

link to the regex

链接到正则表达式

2 个解决方案

#1


1  

You can't match nested brackets with the re module since it doesn't have the recursion feature to do that. However, in your example you can skip the innermost square brackets if you choose to ignore all brackets enclosed between double quotes.

您无法将嵌套括号与re模块匹配,因为它没有这样做的递归功能。但是,在您的示例中,如果您选择忽略双引号之间的所有括号,则可以跳过最里面的方括号。

try something like this:

尝试这样的事情:

p = re.compile(r'script\["([^\\"]*(?:\\.[^\\"]*)*)"]', re.S)

Note: I assumed here that the predicate is only related to the "text" content of the script node (and not an attribute, a number of item or an axe).

注意:我在这里假设谓词仅与脚本节点的“文本”内容相关(而不是属性,项目或斧头的数量)。

#2


0  

It's very hard to understand exactly what you want to achieve because of the way you have written the question. However if you are looking for the firs instance of "] AFTER a } then try this:

由于您编写问题的方式,很难准确理解您想要实现的目标。但是,如果您正在寻找“后期”的第一个实例,那么试试这个:

\["([^}]+}.*?)"\]

Link to the regex

链接到正则表达式

This also would work:

这也可行:

 \["(.*?}.*?)"\]

Link to the second regex example

链接到第二个正则表达式示例

#1


1  

You can't match nested brackets with the re module since it doesn't have the recursion feature to do that. However, in your example you can skip the innermost square brackets if you choose to ignore all brackets enclosed between double quotes.

您无法将嵌套括号与re模块匹配,因为它没有这样做的递归功能。但是,在您的示例中,如果您选择忽略双引号之间的所有括号,则可以跳过最里面的方括号。

try something like this:

尝试这样的事情:

p = re.compile(r'script\["([^\\"]*(?:\\.[^\\"]*)*)"]', re.S)

Note: I assumed here that the predicate is only related to the "text" content of the script node (and not an attribute, a number of item or an axe).

注意:我在这里假设谓词仅与脚本节点的“文本”内容相关(而不是属性,项目或斧头的数量)。

#2


0  

It's very hard to understand exactly what you want to achieve because of the way you have written the question. However if you are looking for the firs instance of "] AFTER a } then try this:

由于您编写问题的方式,很难准确理解您想要实现的目标。但是,如果您正在寻找“后期”的第一个实例,那么试试这个:

\["([^}]+}.*?)"\]

Link to the regex

链接到正则表达式

This also would work:

这也可行:

 \["(.*?}.*?)"\]

Link to the second regex example

链接到第二个正则表达式示例