如何使用PHP从HTML代码中删除冗余的标记?

时间:2022-11-22 19:54:46

I'm parsing some messy HTML code with PHP in which there are some redundant
tags and I would like to clean them up a bit. For instance:

我正在用PHP解析一些混乱的HTML代码,其中有一些冗余标记,我想把它们清理一下。例如:

<br>

<br /><br /> 


<br>

How would I replace something like that with this using preg_replace()?:

如何使用preg_replace()替换类似的东西?

<br /><br />

Newlines, spaces, and the differences between <br>, <br/>, and <br /> would all have to be accounted for.

新行、空格以及


之间的差异都必须加以说明。

Edit: Basically I'd like to replace every instance of three or more successive breaks with just two.

编辑:基本上,我想用两个实例替换三个或更多连续的中断。

5 个解决方案

#1


6  

Here is something you can use. The first line finds whenever there is 2 or more <br> tags (with whitespace between and different types) and replace them with wellformated <br /><br />.

这是你可以用的东西。当有2个或2个以上
标记(在不同类型之间有空格)时,第一行查找,并将其替换为格式良好的

I also included the second line to clean up the rest of the <br> tags if you want that too.

如果您也想清除
标记的其余部分,我还包括第二行。

function clean($txt)
{
    $txt=preg_replace("{(<br[\\s]*(>|\/>)\s*){2,}}i", "<br /><br />", $txt);
    $txt=preg_replace("{(<br[\\s]*(>|\/>)\s*)}i", "<br />", $txt);
    return $txt;
}

#2


5  

This should work, using minimum specifier:

这应该是可行的,使用最小说明符:

preg_replace('/(<br[\s]?[\/]?>[\s]*){3,}/', '<br /><br />', $multibreaks);

Should match appalling <br><br /><br/><br> constructions too.

应匹配骇人听闻的



结构。

#3


3  

this will replace all breaks ... even if they're in uppercase:

这将取代所有的中断……即使他们是大写的:

preg_replace('/<br[^>]*>/i', '', $string);

#4


0  

Try with:

试一试:

preg_replace('/<br\s*\/?>/', '', $inputString);

#5


0  

Use str_replace, its much better for simple replacement, and you can also pass an array instead of a single search value.

使用str_replace,对于简单的替换来说更好,您还可以传递一个数组而不是一个搜索值。

$newcode = str_replace("<br>", "", $messycode);

#1


6  

Here is something you can use. The first line finds whenever there is 2 or more <br> tags (with whitespace between and different types) and replace them with wellformated <br /><br />.

这是你可以用的东西。当有2个或2个以上
标记(在不同类型之间有空格)时,第一行查找,并将其替换为格式良好的

I also included the second line to clean up the rest of the <br> tags if you want that too.

如果您也想清除
标记的其余部分,我还包括第二行。

function clean($txt)
{
    $txt=preg_replace("{(<br[\\s]*(>|\/>)\s*){2,}}i", "<br /><br />", $txt);
    $txt=preg_replace("{(<br[\\s]*(>|\/>)\s*)}i", "<br />", $txt);
    return $txt;
}

#2


5  

This should work, using minimum specifier:

这应该是可行的,使用最小说明符:

preg_replace('/(<br[\s]?[\/]?>[\s]*){3,}/', '<br /><br />', $multibreaks);

Should match appalling <br><br /><br/><br> constructions too.

应匹配骇人听闻的



结构。

#3


3  

this will replace all breaks ... even if they're in uppercase:

这将取代所有的中断……即使他们是大写的:

preg_replace('/<br[^>]*>/i', '', $string);

#4


0  

Try with:

试一试:

preg_replace('/<br\s*\/?>/', '', $inputString);

#5


0  

Use str_replace, its much better for simple replacement, and you can also pass an array instead of a single search value.

使用str_replace,对于简单的替换来说更好,您还可以传递一个数组而不是一个搜索值。

$newcode = str_replace("<br>", "", $messycode);