如何删除除标记之外的所有HTML代码?

时间:2022-11-11 22:44:35

I need to remove all HTML tags except:

我需要删除所有HTML标记,除了:

  • it is <sub> tag
  • 它是标签

  • there is {1 (or more) newline(s) + 4 (or more) spaces} in the behind of it
  • 后面有{1(或更多)换行符+4(或更多)空格}

  • it is surrounded into "`" character.
  • 它被包围成“`”字符。

Here is an examples:

var str = "something1
           <sub>
             something2
             <div class='myclass'>something3</div>
           </sub>
           <div class='myclass'>something4</div>
           something5

               <div class='myclass'>something6</div>
           <div class='myclass'>something7</div>
           `<div>something8</div>`
           something9";

Expected output:

/*   
something1
<sub>
  something2
  something3
</sub>
something4
something5

    <div class='myclass'>something6</div>
`<div>something8</div>`
something9

Here is what I've tried so far:

这是我到目前为止所尝试的:

/\n\s{0,3}<.*[^>]+|<sub>.*?<\/sub>|`.*?`/gm

1 个解决方案

#1


0  

This is possible with regex substitutions. Use this regex with mg modifiers:

这可以通过正则表达式替换来实现。将此正则表达式与mg修饰符一起使用:

(\n\n    .*|`[^`]+`|<\/?sub\b[^>]+>)|<[^>]+>

And use $1 as the substitution.

并使用$ 1作为替代。

There are several parts to this. The capturing group finds all the HTML you may want to keep:

这有几个部分。捕获组找到您可能想要保留的所有HTML:

  • \n\n .* An empty line, and another line that starts with 4 spaces.
  • \ n \ n。*空行,以及以4个空格开头的另一行。

  • `[^`]+` Things in Back`Ticks.
  • `[^`] +`回来的东西'滴答作响。

  • <\/?sub\b[^>]+>) This matches sub HTML elements, opening or closing.
  • <\ /?sub \ b [^>] +>)这匹配子HTML元素,打开或关闭。

The remaining HTML elements will match <[^>]+>, which is discarded.

其余的HTML元素将匹配<[^>] +>,将其丢弃。

#1


0  

This is possible with regex substitutions. Use this regex with mg modifiers:

这可以通过正则表达式替换来实现。将此正则表达式与mg修饰符一起使用:

(\n\n    .*|`[^`]+`|<\/?sub\b[^>]+>)|<[^>]+>

And use $1 as the substitution.

并使用$ 1作为替代。

There are several parts to this. The capturing group finds all the HTML you may want to keep:

这有几个部分。捕获组找到您可能想要保留的所有HTML:

  • \n\n .* An empty line, and another line that starts with 4 spaces.
  • \ n \ n。*空行,以及以4个空格开头的另一行。

  • `[^`]+` Things in Back`Ticks.
  • `[^`] +`回来的东西'滴答作响。

  • <\/?sub\b[^>]+>) This matches sub HTML elements, opening or closing.
  • <\ /?sub \ b [^>] +>)这匹配子HTML元素,打开或关闭。

The remaining HTML elements will match <[^>]+>, which is discarded.

其余的HTML元素将匹配<[^>] +>,将其丢弃。