c#Regex忽略文本周围的可选标签

时间:2023-02-08 11:13:08

Problem:

问题:

So lets say I have the following string:

所以我要说我有以下字符串:

<p><span style=\"font-weight:bold;\">Description:</span>Thomas is currently
 developing a enterprise resource management course for Pluralsight </p>

I am trying to do a regex.replace to remove <span style=\"font-weight:bold;\">Description:</span>

我正在尝试使用regex.replace删除描述:

Often times both the start tag and end tag will not be present so both of these must be optional. Also they won't always be spans. The only thing I can guarentee is that the word "Description:" will be present.

通常,开始标记和结束标记都不会出现,因此这两者都必须是可选的。它们也不会总是跨越。我唯一可以保证的是“描述:”这个词会出现。

What I've tried:

我尝试过的:

This was as close as I could get:

这是我能得到的最接近的:

(?:<.*>)?Description:(?:<\/.*>)?

Unfortunately the starting capture group is also grabbing the starting p tag. I need to make it so that there is never more than 1 start or end tag.

不幸的是,起始捕获组也在抓取起始p标签。我需要这样做,以便永远不会超过1个开始或结束标记。

Also when I use it in a:

当我在一个地方使用它时:

Regex.Replace(text, @"(?:<.*>)?Description:(?:<\\/.*>)?", "")

I'm being returned

我被送回了

</span>Thomas is currently developing a enterprise resource management course for Pluralsight </p>

with the end span tag which it should not be capturing and the starting p tag missing...

与结束跨度标签,它不应该捕获和起始p标签丢失...

EDIT: Although similar to the thread that @kblok posted I only want to remove the first surrounding tag if it's present. This thread is about removing all surrounding tags. Hence my problem with removing the p tag

编辑:虽然类似于@kblok发布的帖子我只想删除第一个周围的标签,如果它存在。这个主题是关于删除所有周围的标签。因此我删除p标签的问题

2 个解决方案

#1


1  

Assuming you don't need to worry about quoted angle brackets, you could use

假设你不需要担心引用的尖括号,你可以使用

(?:<[^<]*>)?Description:(?:<\/[^<]*>)?

Improved pattern to enforce start/end tag name match and around Description only, also remove Description: when tags are not present.

改进的模式以强制执行开始/结束标记名称匹配和仅描述,也删除描述:当标记不存在时。

(?:(?<open><)(?<start>[^ >]+)[^<>]*>)?Description:\k<open>\/?\k<start>>|Description:

#2


0  

This pattern explicitly excludes <p> tags.

此模式明确排除

标记。

(?:<(?!p>|/)[^<>]*>)?Description:(?:</[^<>]*>)?

This one does the same, but is more strict about matching opening and closing tags. It also allows white space between tags

这个做的相同,但对匹配开始和结束标签更严格。它还允许标签之间的空白区域

(?:<(?!p>|/)(?<tag>[^ >]+)(?=[ >])[^<>]*>)?\s*Description:\s*(?:<\/\k<tag>[^<>]*>)?

Considering VDWWD's warning, even this ugly thing might be a bit naive with all possible HTML formatting variations considered, but it should at least match well-formed, simple cases as you've described.

考虑到VDWWD的警告,即使这个丑陋的事情可能有点天真,考虑到所有可能的HTML格式变化,但它至少应该与您描述的格式良好的简单案例相匹配。

#1


1  

Assuming you don't need to worry about quoted angle brackets, you could use

假设你不需要担心引用的尖括号,你可以使用

(?:<[^<]*>)?Description:(?:<\/[^<]*>)?

Improved pattern to enforce start/end tag name match and around Description only, also remove Description: when tags are not present.

改进的模式以强制执行开始/结束标记名称匹配和仅描述,也删除描述:当标记不存在时。

(?:(?<open><)(?<start>[^ >]+)[^<>]*>)?Description:\k<open>\/?\k<start>>|Description:

#2


0  

This pattern explicitly excludes <p> tags.

此模式明确排除

标记。

(?:<(?!p>|/)[^<>]*>)?Description:(?:</[^<>]*>)?

This one does the same, but is more strict about matching opening and closing tags. It also allows white space between tags

这个做的相同,但对匹配开始和结束标签更严格。它还允许标签之间的空白区域

(?:<(?!p>|/)(?<tag>[^ >]+)(?=[ >])[^<>]*>)?\s*Description:\s*(?:<\/\k<tag>[^<>]*>)?

Considering VDWWD's warning, even this ugly thing might be a bit naive with all possible HTML formatting variations considered, but it should at least match well-formed, simple cases as you've described.

考虑到VDWWD的警告,即使这个丑陋的事情可能有点天真,考虑到所有可能的HTML格式变化,但它至少应该与您描述的格式良好的简单案例相匹配。