Nokogiri没有在ruby - xmlns问题中解析XML吗?

Given the following ruby code :

给出以下ruby代码:

require 'nokogiri'

xml = "<?xml version='1.0' encoding='UTF-8'?>
<ProgramList xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns='http://publisher.webservices.affili.net/'>
  <TotalRecords>145</TotalRecords>
  <Programs>
    <ProgramSummary>
      <ProgramID>6540</ProgramID>
      <Title>Matalan</Title>
      <Limitations>A bit of text
      </Limitations>
      <URL>http://www.matalan.co.uk</URL>
      <ScreenshotURL>http://www.matalan.co.uk/</ScreenshotURL>
      <LaunchDate>2009-11-02T00:00:00</LaunchDate>
      <Status>1</Status>
    </ProgramSummary>
    <ProgramSummary>
      <ProgramID>11787</ProgramID>
      <Title>Club 18-30</Title>
      <Limitations/>
      <URL>http://www.club18-30.com/</URL>
      <ScreenshotURL>http://www.club18-30.com</ScreenshotURL>
      <LaunchDate>2013-05-16T00:00:00</LaunchDate>
      <Status>1</Status>
    </ProgramSummary>
  </Programs>
</ProgramList>"

doc = Nokogiri::XML(xml)
p doc.xpath("//Programs")

gives :

给:

=> []

Not what is expected.

不期望是什么。

On further investigation if I remove xmlns='http://publisher.webservices.affili.net/' from the initial <ProgramList> tag I get the expected output.

在进一步的研究中，如果我从最初的标记中删除xmlns=' http://publisher.webservices.affiliates .net/'，我将得到预期的输出。

Indeed if I change xmlns='http://publisher.webservices.affili.net/' to xmlns:anything='http://publisher.webservices.affili.net/' I get the expected output.

实际上，如果我将xmlns=' http://publisher.webservices.affiliates .net/'更改为xmlns:anything=' http://publisher.webservices.affiliates .net/'，我将获得预期的输出。

So my question is what is going on here? Is this malformed XML? And what is the best strategy for dealing with it?

我的问题是这里发生了什么?这是畸形的XML吗?那么最好的应对策略是什么呢?

While it's hardcoded in this example the XML is (will be) coming from a web service.

虽然在本例中它是硬编码的，但是XML来自web服务。

Update

I realise I can use the remove_namespaces! method but the Nokogiri docs do say that it's "...probably is not a good thing in general" to do this. Also I'm interested in why it's happening and what the 'correct' XML should be.

我意识到我可以使用remove_namespaces!但是Nokogiri的医生说它是……一般来说，“这样做”可能不是一件好事。我还对它发生的原因以及“正确”的XML应该是什么感兴趣。

1 个解决方案

#1

The xmlns='http://publisher.webservices.affili.net/' indicates the default namespace for all elements under the one where it appears (including the element itself). That means that all elements that don’t otherwise have an explicit namespace fall under this namespace.

xmlns=' http://publisher.webservices.relation.net/ '指示出现在其中的元素(包括元素本身)下的所有元素的默认名称空间。这意味着所有没有显式名称空间的元素都属于这个名称空间。

XPath queries don’t have default namespaces (at least in XPath 1.0), so any name that appears in one without a prefix refers to that element in no namespace.

XPath查询没有默认的名称空间(至少在XPath 1.0中没有)，所以在没有前缀的名称空间中出现的任何名称都指向没有名称空间的元素。

In your code, you want to find Program elements in the http://publisher.webservices.affili.net/ namespace (since that is the default namespace), but are looking (in your XPath query) for Program elements in no namespace.

在您的代码中，您希望在http://publisher.webservices.member.net/namespace(因为这是默认的名称空间)中找到程序元素，但是在XPath查询中，您希望在no名称空间中查找程序元素。

To explicitly specify the namespace in the query, you can do something like this:

要显式地指定查询中的名称空间，可以执行以下操作:

doc.xpath("//pub:Programs", "pub" => "http://publisher.webservices.affili.net/")

Nokogiri makes this a little easier for namespaces declared on the root element (as in this case), declaring them for you with the same prefix. It will also declare the default namespace using the xmlns prefix, so you can also do:

Nokogiri使根元素上声明的名称空间(在本例中)更容易实现这一点，并使用相同的前缀为您声明它们。它还将使用xmlns前缀声明默认名称空间，因此您还可以:

doc.xpath("//xmlns:Programs")

which will give you the same result.

结果是一样的。

#1