如何通过Domdocument PHP获得第一级dom元素?

时间:2022-10-27 10:22:47

How get first level of dom elements by Domdocument PHP?

如何通过Domdocument PHP获得第一级dom元素?

Example with code that not works - tooken from Q&A:http://*.com/questions/1540302/how-to-get-nodes-in-first-level-using-php-domdocument

代码不起作用的示例 - 来自问答:http://*.com/questions/1540302/how-to-get-nodes-in-first-level-using-php-domdocument

<?php
$str=<<< EOD
<div id="header">
</div>
<div id="content">
    <div id="sidebar">
    </div>
    <div id="info">
    </div>
</div>
<div id="footer">
</div>
EOD;

$doc = new DOMDocument();
$doc->loadHTML($str);
$xpath = new DOMXpath($doc);
$entries = $xpath->query("/");
foreach ($entries as $entry) {
    var_dump($entry->firstChild->nodeValue);
}
?>

Thanks, Yosef

1 个解决方案

#1


28  

The first level of elements below the root node can be accessed with

可以使用以下方法访问根节点下的第一级元素

$dom->documentElement->childNodes

The childNodes property contains a DOMNodeList, which you can iterate with foreach.

childNodes属性包含DOMNodeList,您可以使用foreach进行迭代。

See DOMDocument::documentElement

This is a convenience attribute that allows direct access to the child node that is the document element of the document.

这是一个便捷属性,允许直接访问作为文档的文档元素的子节点。

and DOMNode::childNodes

A DOMNodeList that contains all children of this node. If there are no children, this is an empty DOMNodeList.

包含此节点的所有子节点的DOMNodeList。如果没有子节点,则这是一个空的DOMNodeList。

Since childNodes is a property of DOMNode any class extending DOMNode (which is most of the classes in DOM) have this property, so to get the first level of elements below a DOMElement is to access that DOMElement's childNode property.

由于childNodes是DOMNode的一个属性,因此任何扩展DOMNode(DOM中的大多数类)的类都具有此属性,因此要获取DOMElement下面的第一级元素是访问DOMElement的childNode属性。


Note that if you use DOMDocument::loadHTML() on invalid HTML or partial documents, the HTML parser module will add an HTML skeleton with html and body tags, so in the DOM tree, the HTML in your example will be

请注意,如果对无效的HTML或部分文档使用DOMDocument :: loadHTML(),HTML解析器模块将添加带有html和body标签的HTML骨架,因此在DOM树中,示例中的HTML将是

<!DOCTYPE html … ">
<html><body><div id="header">
</div>
<div id="content">
    <div id="sidebar">
    </div>
    <div id="info">
    </div>
</div>
<div id="footer">
</div></body></html>

which you have to take into account when traversing or using XPath. Consequently, using

在遍历或使用XPath时必须考虑的因素。因此,使用

$dom = new DOMDocument;
$dom->loadHTML($str);
foreach ($dom->documentElement->childNodes as $node) {
    echo $node->nodeName; // body
}

will only iterate the <body> DOMElement node. Knowing that libxml will add the skeleton, you will have to iterate over the childNodes of the <body> element to get the div elements from your example code, e.g.

只会迭代 DOMElement节点。知道libxml会添加骨架,你将不得不迭代元素的childNodes来获取示例代码中的div元素,例如:

$dom->getElementsByTagName('body')->item(0)->childNodes

However, doing so will also take into account any whitespace nodes, so you either have to make sure to set preserveWhiteSpace to false or query for the right element nodeType if you only want to get DOMElement nodes, e.g.

但是,这样做也会考虑任何空白节点,因此如果您只想获取DOMElement节点,则必须确保将preserveWhiteSpace设置为false或查询正确的元素nodeType。

foreach ($dom->getElementsByTagName('body')->item(0)->childNodes as $node) {
    if ($node->nodeType === XML_ELEMENT_NODE) {
        echo $node->nodeName;
    }
}

or use XPath

或使用XPath

$dom->loadHTML($str);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('/html/body/*') as $node) {
    echo $node->nodeName;
}

Additional information:

#1


28  

The first level of elements below the root node can be accessed with

可以使用以下方法访问根节点下的第一级元素

$dom->documentElement->childNodes

The childNodes property contains a DOMNodeList, which you can iterate with foreach.

childNodes属性包含DOMNodeList,您可以使用foreach进行迭代。

See DOMDocument::documentElement

This is a convenience attribute that allows direct access to the child node that is the document element of the document.

这是一个便捷属性,允许直接访问作为文档的文档元素的子节点。

and DOMNode::childNodes

A DOMNodeList that contains all children of this node. If there are no children, this is an empty DOMNodeList.

包含此节点的所有子节点的DOMNodeList。如果没有子节点,则这是一个空的DOMNodeList。

Since childNodes is a property of DOMNode any class extending DOMNode (which is most of the classes in DOM) have this property, so to get the first level of elements below a DOMElement is to access that DOMElement's childNode property.

由于childNodes是DOMNode的一个属性,因此任何扩展DOMNode(DOM中的大多数类)的类都具有此属性,因此要获取DOMElement下面的第一级元素是访问DOMElement的childNode属性。


Note that if you use DOMDocument::loadHTML() on invalid HTML or partial documents, the HTML parser module will add an HTML skeleton with html and body tags, so in the DOM tree, the HTML in your example will be

请注意,如果对无效的HTML或部分文档使用DOMDocument :: loadHTML(),HTML解析器模块将添加带有html和body标签的HTML骨架,因此在DOM树中,示例中的HTML将是

<!DOCTYPE html … ">
<html><body><div id="header">
</div>
<div id="content">
    <div id="sidebar">
    </div>
    <div id="info">
    </div>
</div>
<div id="footer">
</div></body></html>

which you have to take into account when traversing or using XPath. Consequently, using

在遍历或使用XPath时必须考虑的因素。因此,使用

$dom = new DOMDocument;
$dom->loadHTML($str);
foreach ($dom->documentElement->childNodes as $node) {
    echo $node->nodeName; // body
}

will only iterate the <body> DOMElement node. Knowing that libxml will add the skeleton, you will have to iterate over the childNodes of the <body> element to get the div elements from your example code, e.g.

只会迭代 DOMElement节点。知道libxml会添加骨架,你将不得不迭代元素的childNodes来获取示例代码中的div元素,例如:

$dom->getElementsByTagName('body')->item(0)->childNodes

However, doing so will also take into account any whitespace nodes, so you either have to make sure to set preserveWhiteSpace to false or query for the right element nodeType if you only want to get DOMElement nodes, e.g.

但是,这样做也会考虑任何空白节点,因此如果您只想获取DOMElement节点,则必须确保将preserveWhiteSpace设置为false或查询正确的元素nodeType。

foreach ($dom->getElementsByTagName('body')->item(0)->childNodes as $node) {
    if ($node->nodeType === XML_ELEMENT_NODE) {
        echo $node->nodeName;
    }
}

or use XPath

或使用XPath

$dom->loadHTML($str);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('/html/body/*') as $node) {
    echo $node->nodeName;
}

Additional information: