如何使用Regexp从Div获取包含其他HTML标记的内容

I have div which contain other html tags along with text

我有div包含其他html标签以及文本

I want to extract only text from this div OR inside all html tags

我想从所有html标签中的这个div OR中仅提取文本

<div class="rpr-help m-chm">
                <div class="header">
                    <h2 class="h6">Repair Help</h2>
                </div><!-- /end .header -->
                <div class="inner m-bsc">
                    <ul>


                        <li><a href="#videol">Repair Video</a></li>

                        <li><a href="#qa1">Repair Q&amp;A</a></li>
                    </ul>
                </div>

                    <div>
                    <br>
                    <span class="h4">Cross Reference Information</span><br>
                    <p>Part Number 285753A (AP3963893) replaces  1195967, 280152, 285140, 285743, 285753, 3352470, 3363664, 3364002, 3364003, 62672, 62693, 661560, 80008, 8559748, AH1485646, EA1485646, PS1485646.
                    <br>
                    </p>
                    </div>

            </div>

Here is my Regexp

这是我的正则表达式

preg_match_all("/<div class=\"rpr-help m-chm\">(.*)<\/.*>/s", $urlcontent, $description);

Its working fine whenever I assign this complete div to $urlcontent variable.

每当我将这个完整的div分配给$ urlcontent变量时，它的工作正常。

But when I am fetching data from real url like $urlcontent = "www.test.com/test.html"; its returning complete webpage script.

但是当我从真实网址获取数据时，例如$ urlcontent =“www.test.com/test.html”;它返回完整的网页脚本。

How can I get inside content of <div class="rpr-help m-chm"> ?

如何获取

的内容？

Is there any correction require in my regexp?

我的正则表达式中是否有任何更正要求？

Any help would be appreciated. Thanks

任何帮助，将不胜感激。谢谢

2 个解决方案

#1

It's not possible to parse HTML/XHTML by regex. Source

通过正则表达式解析HTML / XHTML是不可能的。资源

You can't parse [X]HTML with regex. Because HTML can't be parsed by regex. Regex is not a tool that can be used to correctly parse HTML

您无法使用正则表达式解析[X] HTML。因为正则表达式无法解析HTML。正则表达式不是可用于正确解析HTML的工具

Based on the language you use, Please consider using a thirdpart library for HTML parsing.

根据您使用的语言，请考虑使用第三方库进行HTML解析。

#2

use this function

    function GetclassContent($tagStart,$tagEnd,$content)
    {
        $first_step = explode( $tagStart,$content );
        $second_step = explode($tagEnd,$first_step[1] );
        return $second_step[0];
    }

Steps to Use Above function 
$website="www.test.com/test.html";
$content=file_get_contents($website);
$tagStart ='<div  class="rpr-help m-chm">';
$tagEnd   = "</div >";
$RequiredContent = GetclassContent($tagStart,$tagEnd,$content);

#1