如何使用RegEx删除html元素及其内容

时间:2022-11-28 09:21:50

I have a div id like to remove from an output which looks like

我有一个div id喜欢从输出中删除看起来像

<div id="ithis" class="cthis">Content here which includes other elements etc..) </div>

How can I remove this div and everything within it using PHP and regex?

如何使用PHP和正则表达式删除此div及其中的所有内容?

Thank you.

谢谢。

3 个解决方案

#1


13  

The simple answer is that you don't. You use one of PHP's many HTML parsers instead. Regexes are a flaky and error-prone way of manipulating HTML.

简单的答案是你没有。您可以使用PHP的许多HTML解析器之一。正则表达式是一种操作HTML的片状且容易出错的方式。

That being said you can do this:

话虽这么说你可以这样做:

$html = preg_replace('!<div\s+id="ithis"\s+class="cthis">.*?</div>!is', '', $html);

But many things can wrong with this. For example, if that contains a div:

但很多事情都可能出错。例如,如果包含div:

<div id="ithis" class="cthis">Content here which <div>includes</div> other elements etc..) </div>

you'll end up with:

你会最终得到:

 other elements etc..) </div>

as the regex will stop at the first </div>. And no there's nothing you can really do to solve this problem (with regular expressions) consistently.

因为正则表达式将停在第一个 。并且没有什么可以用来一致地解决这个问题(使用正则表达式)。

Done with a parser it looks more like this:

用解析器完成它看起来更像是这样的:

$doc = new DOMDocument();
$doc->loadHTML($html);
$element = $doc->getElementById('ithis');
$element->parentNode->removeChild($element);
$html = $doc->saveHTML();

#2


1  

I don't know about PHP, but you can replace /<id.*?<\/id[^>]*>/ with nothing.

我不知道PHP,但你可以用什么都不用取代/ ]*>。 *?

#3


0  

PHP is server side, and the output is coming from the server. Can't you just not output it? Or are you trying to hide it? If so, in a stylesheet, just say #ithis {display:none}.

PHP是服务器端,输出来自服务器。你能不能输出它吗?或者你想隐藏它?如果是这样,在样式表中,只需说#ithis {display:none}。

If the string is a return from some function in PHP that you haven't written AND you don't want to muck with that code, you have to write a very difficult regex to account for nested div's, varying syntax in the output, etc. I'd recommend using some parser (perhaps this Zend Framework component) to help you out. I've used it a few times for something similar. Although if you're not familiar with ZF at all, you may want to try something else.

如果字符串是PHP中某些函数的返回,你还没有编写并且你不想使用该代码,你必须编写一个非常困难的正则表达式来解释嵌套的div,输出中不同的语法等等我建议使用一些解析器(也许这个Zend Framework组件)来帮助你。我曾经用过几次类似的东西。虽然如果你根本不熟悉ZF,你可能想尝试其他的东西。

#1


13  

The simple answer is that you don't. You use one of PHP's many HTML parsers instead. Regexes are a flaky and error-prone way of manipulating HTML.

简单的答案是你没有。您可以使用PHP的许多HTML解析器之一。正则表达式是一种操作HTML的片状且容易出错的方式。

That being said you can do this:

话虽这么说你可以这样做:

$html = preg_replace('!<div\s+id="ithis"\s+class="cthis">.*?</div>!is', '', $html);

But many things can wrong with this. For example, if that contains a div:

但很多事情都可能出错。例如,如果包含div:

<div id="ithis" class="cthis">Content here which <div>includes</div> other elements etc..) </div>

you'll end up with:

你会最终得到:

 other elements etc..) </div>

as the regex will stop at the first </div>. And no there's nothing you can really do to solve this problem (with regular expressions) consistently.

因为正则表达式将停在第一个 。并且没有什么可以用来一致地解决这个问题(使用正则表达式)。

Done with a parser it looks more like this:

用解析器完成它看起来更像是这样的:

$doc = new DOMDocument();
$doc->loadHTML($html);
$element = $doc->getElementById('ithis');
$element->parentNode->removeChild($element);
$html = $doc->saveHTML();

#2


1  

I don't know about PHP, but you can replace /<id.*?<\/id[^>]*>/ with nothing.

我不知道PHP,但你可以用什么都不用取代/ ]*>。 *?

#3


0  

PHP is server side, and the output is coming from the server. Can't you just not output it? Or are you trying to hide it? If so, in a stylesheet, just say #ithis {display:none}.

PHP是服务器端,输出来自服务器。你能不能输出它吗?或者你想隐藏它?如果是这样,在样式表中,只需说#ithis {display:none}。

If the string is a return from some function in PHP that you haven't written AND you don't want to muck with that code, you have to write a very difficult regex to account for nested div's, varying syntax in the output, etc. I'd recommend using some parser (perhaps this Zend Framework component) to help you out. I've used it a few times for something similar. Although if you're not familiar with ZF at all, you may want to try something else.

如果字符串是PHP中某些函数的返回,你还没有编写并且你不想使用该代码,你必须编写一个非常困难的正则表达式来解释嵌套的div,输出中不同的语法等等我建议使用一些解析器(也许这个Zend Framework组件)来帮助你。我曾经用过几次类似的东西。虽然如果你根本不熟悉ZF,你可能想尝试其他的东西。