无法弄清楚如何使用正则表达式获取HTML标记中包含的模式[重复]

时间:2022-09-22 15:08:12

This question already has an answer here:

这个问题在这里已有答案:

I just started learning about Regexes and can't figure out how to lift Gizmo from the HTML tag

我刚开始学习Regexes,无法弄清楚如何从HTML标签中提升Gizmo

<meta content="Gizmo" property="og:title" />

I'm stuck at the (?<Name>meta content=), which is basically nothing, but I don't know what to do from there.

我坚持(? meta content =),这基本上没什么,但我不知道该怎么做。

1 个解决方案

#1


2  

It's well known you shouldn't use regex to parse html (actually, it's been said millon times), you should use a html parser instead.

众所周知,你不应该使用正则表达式来解析html(实际上,它已经被称为millon次),你应该使用html解析器。

On the other hand, if you want to use regex for this... you are pretty close, you have to use:

另一方面,如果你想使用正则表达式...你很接近,你必须使用:

(?<Name>meta content=".*?")

Btw, if you want to grab the word Gizmo you have to use capturing groups also withing your group Name

顺便说一句,如果你想获取Gizmo这个词,你必须使用捕获组以及你的组名

(?<Name>meta content="(.*?)")

Working demo

工作演示

On the other hand, if you don't care about capturing meta content and you just want to capture the content within content, you can use use:

另一方面,如果您不关心捕获元内容并且您只想捕获内容中的内容,则可以使用:

content="(?<Name>.*?)"

Working demo 2

工作演示2

#1


2  

It's well known you shouldn't use regex to parse html (actually, it's been said millon times), you should use a html parser instead.

众所周知,你不应该使用正则表达式来解析html(实际上,它已经被称为millon次),你应该使用html解析器。

On the other hand, if you want to use regex for this... you are pretty close, you have to use:

另一方面,如果你想使用正则表达式...你很接近,你必须使用:

(?<Name>meta content=".*?")

Btw, if you want to grab the word Gizmo you have to use capturing groups also withing your group Name

顺便说一句,如果你想获取Gizmo这个词,你必须使用捕获组以及你的组名

(?<Name>meta content="(.*?)")

Working demo

工作演示

On the other hand, if you don't care about capturing meta content and you just want to capture the content within content, you can use use:

另一方面,如果您不关心捕获元内容并且您只想捕获内容中的内容,则可以使用:

content="(?<Name>.*?)"

Working demo 2

工作演示2