如何使用XML DOM API转到每个非文本节点?

时间:2022-06-20 06:07:34

I am new to XML, and DOM. I guess I need to use DOM API to find go through every non-text nodes once, and output the node name.

我是XML和DOM的新手。我想我需要使用DOM API来查找每个非文本节点一次,并输出节点名称。

say I got this example XML from W3C

说我从W3C得到了这个例子XML

<bookstore>

<book category="cooking">
 <title lang="en">Everyday Italian</title>
 <author>Giada De Laurentiis</author>
 <year>2005</year>
 <price>30.00</price>
 <page pagenumber="550"/>
</book>

<book category="children">
 <title lang="en">Harry Potter</title>
 <author>J K. Rowling</author>
 <year>2005</year>
 <price>29.99</price>
 <page pagenumber="500"/>
</book>
</bookstore>

I need to find node such as <page pagenumber="500" /> which is a non-text node

我需要找到诸如 之类的节点,这是一个非文本节点

How can I do that? seduo-code would be fine too. Thanks

我怎样才能做到这一点? seduo-code也没关系。谢谢

can I say

我能说......么

 while (x.nodeValue == NULL) {
   read the next node ?
}

I guess I should make myself clear, no assumption on any docuemnts. This should work on all XML as long as there is a non-text node. I guess this should be done in the order from top-down and from left to right for every nodes. :(

我想我应该清楚自己,不要假设任何博士。只要存在非文本节点,这应该适用于所有XML。我想这应该按照从上到下和从左到右的顺序为每个节点完成。 :(

4 个解决方案

#1


3  

XPATH ="//*[not(text())]"
Will select all nodes which are non-text node.
Here in the given example: bookstore and book are also non-text nodes as they does not have any text of their own, though their children do have text.

XPATH =“// * [not(text())]”将选择所有非文本节点的节点。在给定的示例中:书店和书籍也是非文本节点,因为它们没有自己的任何文本,尽管他们的孩子确实有文本。

#2


2  

Your question basically seems to be : Given an XML document, How do I find child nodes that do not have any text-content.

你的问题基本上似乎是:给定一个XML文档,我如何找到没有任何文本内容的子节点。

A simple XPath expression such as:

一个简单的XPath表达式,例如:

/bookstore/book/*[count(child::text()) = 0]

or

/bookstore/book/*[not(text())]

will do it for you. Applying this XPath expression on the sample document will return a node-set containing both the page elements. You do not have to know the name of the page element beforehand, or even the names of all possible child elements of the book element, as you can see.

会为你做的。在示例文档上应用此XPath表达式将返回包含两个页面元素的节点集。您可以看到,您不必事先知道页面元素的名称,甚至不必知道book元素的所有可能子元素的名称。

To explain: You need to query for child-nodes of the book element that do not contain ANY textual child nodes. The child::* axis represents all child nodes of the current node and the text() node-type restricts the processed node types to those that contain textual content.

解释:您需要查询不包含任何文本子节点的book元素的子节点。 child :: *轴表示当前节点的所有子节点,text()node-type将处理的节点类型限制为包含文本内容的节点类型。

Edit: Note that if you want to query for non-text nodes in any XML document (as per your latest edit to the question), you should choose the answer provided by nils_gate. My answer was given prior to your edit and illustrates the concept, rather than providing a generic solution.

编辑:请注意,如果要查询任何XML文档中的非文本节点(根据您对问题的最新编辑),您应该选择nils_gate提供的答案。我的答案在您编辑之前给出并说明了概念,而不是提供通用解决方案。

#3


1  

What do you know about the node you need to find? If you know exactly that it's:

您对需要查找的节点了解多少?如果您确切知道它是:

  • A page element
  • 页面元素

  • It has a pagenumber attribute with value 500
  • 它有一个值为500的pagenumber属性

then XPath is the way forward (assuming it's available on your platform - you haven't specified beyond "DOM"; most DOM implementations include XPath as far as I've seen).

那么XPath是前进的方向(假设它在您的平台上可用 - 您没有指定超出“DOM”;大多数DOM实现包括XPath,据我所见)。

In this case you'd use an XPath of:

在这种情况下,您将使用以下XPath:

//page[@pagenumber='500']

If you can't use XPath, please explain which DOM API you're using and we can try to come up with the best solution. Basically you'll probably end up iterating over every element node, checking whether its name is page and then checking whether it has an appropriate pagenumber attribute value.

如果您不能使用XPath,请解释您正在使用的DOM API,我们可以尝试提供最佳解决方案。基本上,您可能最终会遍历每个元素节点,检查其名称是否为页面,然后检查它是否具有适当的pagenumber属性值。

#4


1  

Looks like you'll be needing an XPath. The W3 Schools site has a good reference, but, assuming the node always appears under a node, the XPath /bookstore/book/page will return a node set with each node in it. /bookstore/book/page[@pagenumber='500'] will get each node where the pagenumber attribute has a value of 500.

看起来你需要一个XPath。 W3 Schools网站有一个很好的参考,但是,假设节点始终出现在节点下,XPath / bookstore / book / page将返回一个节点集,其中包含每个节点。 / bookstore / book / page [@ pagenumber ='500']将获取pagenumber属性值为500的每个节点。

The // syntax will find the node anywhere in the document without worrying about structure - this can be easier but is slower, especially with large documents. If you have a document with a known structure, it's best to use the explicit XPath.

//语法将在文档中的任何位置找到节点而不用担心结构 - 这可能更容易但速度较慢,尤其是对于大型文档。如果您的文档具有已知结构,则最好使用显式XPath。

#1


3  

XPATH ="//*[not(text())]"
Will select all nodes which are non-text node.
Here in the given example: bookstore and book are also non-text nodes as they does not have any text of their own, though their children do have text.

XPATH =“// * [not(text())]”将选择所有非文本节点的节点。在给定的示例中:书店和书籍也是非文本节点,因为它们没有自己的任何文本,尽管他们的孩子确实有文本。

#2


2  

Your question basically seems to be : Given an XML document, How do I find child nodes that do not have any text-content.

你的问题基本上似乎是:给定一个XML文档,我如何找到没有任何文本内容的子节点。

A simple XPath expression such as:

一个简单的XPath表达式,例如:

/bookstore/book/*[count(child::text()) = 0]

or

/bookstore/book/*[not(text())]

will do it for you. Applying this XPath expression on the sample document will return a node-set containing both the page elements. You do not have to know the name of the page element beforehand, or even the names of all possible child elements of the book element, as you can see.

会为你做的。在示例文档上应用此XPath表达式将返回包含两个页面元素的节点集。您可以看到,您不必事先知道页面元素的名称,甚至不必知道book元素的所有可能子元素的名称。

To explain: You need to query for child-nodes of the book element that do not contain ANY textual child nodes. The child::* axis represents all child nodes of the current node and the text() node-type restricts the processed node types to those that contain textual content.

解释:您需要查询不包含任何文本子节点的book元素的子节点。 child :: *轴表示当前节点的所有子节点,text()node-type将处理的节点类型限制为包含文本内容的节点类型。

Edit: Note that if you want to query for non-text nodes in any XML document (as per your latest edit to the question), you should choose the answer provided by nils_gate. My answer was given prior to your edit and illustrates the concept, rather than providing a generic solution.

编辑:请注意,如果要查询任何XML文档中的非文本节点(根据您对问题的最新编辑),您应该选择nils_gate提供的答案。我的答案在您编辑之前给出并说明了概念,而不是提供通用解决方案。

#3


1  

What do you know about the node you need to find? If you know exactly that it's:

您对需要查找的节点了解多少?如果您确切知道它是:

  • A page element
  • 页面元素

  • It has a pagenumber attribute with value 500
  • 它有一个值为500的pagenumber属性

then XPath is the way forward (assuming it's available on your platform - you haven't specified beyond "DOM"; most DOM implementations include XPath as far as I've seen).

那么XPath是前进的方向(假设它在您的平台上可用 - 您没有指定超出“DOM”;大多数DOM实现包括XPath,据我所见)。

In this case you'd use an XPath of:

在这种情况下,您将使用以下XPath:

//page[@pagenumber='500']

If you can't use XPath, please explain which DOM API you're using and we can try to come up with the best solution. Basically you'll probably end up iterating over every element node, checking whether its name is page and then checking whether it has an appropriate pagenumber attribute value.

如果您不能使用XPath,请解释您正在使用的DOM API,我们可以尝试提供最佳解决方案。基本上,您可能最终会遍历每个元素节点,检查其名称是否为页面,然后检查它是否具有适当的pagenumber属性值。

#4


1  

Looks like you'll be needing an XPath. The W3 Schools site has a good reference, but, assuming the node always appears under a node, the XPath /bookstore/book/page will return a node set with each node in it. /bookstore/book/page[@pagenumber='500'] will get each node where the pagenumber attribute has a value of 500.

看起来你需要一个XPath。 W3 Schools网站有一个很好的参考,但是,假设节点始终出现在节点下,XPath / bookstore / book / page将返回一个节点集,其中包含每个节点。 / bookstore / book / page [@ pagenumber ='500']将获取pagenumber属性值为500的每个节点。

The // syntax will find the node anywhere in the document without worrying about structure - this can be easier but is slower, especially with large documents. If you have a document with a known structure, it's best to use the explicit XPath.

//语法将在文档中的任何位置找到节点而不用担心结构 - 这可能更容易但速度较慢,尤其是对于大型文档。如果您的文档具有已知结构,则最好使用显式XPath。