如何使用SAX解析器解析名称空间?

时间:2021-11-12 00:50:37

Using a twitter search URL ie. http://search.twitter.com/search.rss?q=android returns CSS that has an item that looks like:

使用twitter搜索URL ie。http://search.twitter.com/search.rss?android返回的CSS有一个看起来像:

<item>
      <title>@UberTwiter still waiting for @ubertwitter  android app!!!</title>
      <link>http://twitter.com/meals69/statuses/21158076391</link>
      <description>still waiting for an app!!!</description>
      <pubDate>Sat, 14 Aug 2010 15:33:44 +0000</pubDate>
      <guid>http://twitter.com/meals69/statuses/21158076391</guid>
      <author>Some Twitter User</author>
      <media:content type="image/jpg" height="48" width="48" url="http://a1.twimg.com/profile_images/756343289/me2_normal.jpg"/>
      <google:image_link>http://a1.twimg.com/profile_images/756343289/me2_normal.jpg</google:image_link>
      <twitter:metadata>
        <twitter:result_type>recent</twitter:result_type>
</twitter:metadata>
</item>

Pretty simple. My code parses out everything (title, link, description, pubDate, etc.) without any problems. However, I'm getting null on:

很简单。我的代码解析所有的东西(标题、链接、描述、发布日期等),没有任何问题。但是,我得到的是空值:

<google:image_link>

I'm using Java to parse the RSS feed. Do I have to handle compound localnames differently than I would a more simple localname?

我正在使用Java解析RSS提要。我是否必须以与更简单的localname不同的方式处理复合localname?

This is the bit of code that parses out Link, Description, pubDate, etc:

这段代码解析了链接、描述、发布日期等:

@Override
    public void endElement(String uri, String localName, String name)
            throws SAXException {
        super.endElement(uri, localName, name);
        if (this.currentMessage != null){
            if (localName.equalsIgnoreCase(TITLE)){
                currentMessage.setTitle(builder.toString());
            } else if (localName.equalsIgnoreCase(LINK)){
                currentMessage.setLink(builder.toString());
            } else if (localName.equalsIgnoreCase(DESCRIPTION)){
                currentMessage.setDescription(builder.toString());
            } else if (localName.equalsIgnoreCase(PUB_DATE)){
                currentMessage.setDate(builder.toString());
            } else if (localName.equalsIgnoreCase(GUID)){
                currentMessage.setGuid(builder.toString());
            } else if (uri.equalsIgnoreCase(AVATAR)){
                currentMessage.setAvatar(builder.toString());
            } else if (localName.equalsIgnoreCase(ITEM)){
                messages.add(currentMessage);
            } 
            builder.setLength(0);   
        }
    }

startDocument looks like:

startDocument看起来像:

@Override
    public void startDocument() throws SAXException {
        super.startDocument();
        messages = new ArrayList<Message>();
        builder = new StringBuilder();

    }

startElement looks like:

startElement看起来像:

@Override
    public void startElement(String uri, String localName, String name,
            Attributes attributes) throws SAXException {
        super.startElement(uri, localName, name, attributes);
        if (localName.equalsIgnoreCase(ITEM)){
            this.currentMessage = new Message();
        } 
    }

Tony

托尼

5 个解决方案

#1


1  

An element like <google:image_link> has the local name image_link belonging to the google namespace. You need to ensure that the XML parsing framework is aware of namespaces, and you'd then need to find this element using the appropriate namespace.

像 <谷歌:image_link> 这样的元素具有属于谷歌名称空间的本地名称image_link。您需要确保XML解析框架知道名称空间,然后需要使用适当的名称空间找到该元素。

For example, a few SAX1 interfaces in package org.xml.sax has been deprecated, replaced by SAX2 counterparts that include namespace support (e.g. SAX1 Parser is deprecated and replaced by SAX2 XMLReader). Consult the documentation on how to specify the namespace uri or qualified (prefixed) qName.

例如,在package org.xml中有几个SAX1接口。sax已被弃用,取而代之的是SAX2,其中包含名称空间支持(例如,SAX1解析器被弃用,并被SAX2 XMLReader所取代)。查阅关于如何指定名称空间uri或限定(前缀)qName的文档。

See also

#2


1  

From sample it is not actually clear what namespace that 'google' prefix binds to -- previous answer is slightly incorrect in that it is NOT in "google" namespace; rather, it is a namespace that prefix "google" binds to. As such you have to match the namespace (identified by URI), and not prefix. SAX does have confusing way of reporting local name / namespace-prefix combinations, and it depends on whether namespace processing is even enabled.

从示例中,实际上不清楚“谷歌”前缀绑定到哪个名称空间——之前的答案有点不正确,因为它不在“谷歌”名称空间中;相反,它是前缀“谷歌”绑定到的名称空间。因此,您必须匹配名称空间(由URI标识),而不是前缀。SAX报告本地名称/名称空间前缀组合的方式确实令人困惑,这取决于是否启用名称空间处理。

You could also consider alternative XML processing libraries / APIs; while SAX implementations are performant, there are as fast and more convenient alternatives. Stax (javax.xml.stream.*) implementations like Woodstox (and even default one that JDK 1.6 comes with) are fast and bit more convenient. And StaxMate library that builds on top of Stax is much simpler to use for both reading and writing, and speedwise as fast as SAX implementations like Xerces. Plus Stax API has less baggage wrt namespace handling so it is easier to see what is the actual namespace of elements.

您还可以考虑其他的XML处理库/ api;虽然SAX实现是高性能的,但也有快速和更方便的替代方法。Stax (javax.xml.stream.*)实现像Woodstox(甚至是JDK 1.6附带的默认实现)这样的实现既快又方便。而构建在Stax之上的StaxMate库对于读写都更简单,速度也和SAX实现(如Xerces)一样快。另外,Stax API的行李空间处理更少,所以更容易看到元素的实际名称空间。

#3


0  

Like user polygenelubricants said: generally the parser needs to be namespace aware to handle elements which belong to some prefixed namespace. (Like that <google:image_link> element.)

就像用户polygenelubricants说的那样:通常解析器需要知道名称空间来处理属于某个前缀命名空间的元素。(如, <谷歌:image_link> 元素)。

This needs to be set as a "parser feature" which AFAIK can be done in few different ways: The XMLReader interface itself has method setFeature() that can be used to set features for a certain parser but you can also use same method for SAXParserFactory class so that this factory generates parsers with those features already on or off. SAX2 standard feature flags should be on SAXproject's website but at least some of them are also listed in Java API documentation of package org.xml.sax.

这需要设置为一个“解析器特性”AFAIK可以通过几种不同的方法:XMLReader接口本身setFeature()方法,可用于设置特性一定的解析器,但您还可以使用相同的方法SAXParserFactory类,以便工厂能产生与这些功能已经打开或关闭。SAX2解析器标准特征标志应该SAXproject网站上,但至少他们中的一些人也org.xml.sax Java API文档中列出的包。

For simple documents you can try to take a shortcut. If you don't actually care about namespaces and element names as in a URL + local-name combination, and you can trust that the elements you are looking for (and only these) always have certain prefix and that there aren't elements from other namespaces with same local name then you might just solve your problem by using qname parameter of startElement() method instead of localName or vice versa or by adding/dropping the prefix from the tag name string you compare to.

对于简单的文档,您可以尝试使用快捷方式。如果你实际上并不关心名称空间和元素名称和URL +本地名称组合,你可以相信,你正在寻找的元素(并且只有这些)总是有一定的前缀,没有来自其他名称空间元素相同的本地名称然后你可以解决你的问题通过使用qname startElement()方法的参数,而不是localName反之亦然或添加/删除标记名称的前缀字符串比较。

The contents of parameters namespaceUri, qname or localName is according to Java specs actually optional and AFAIK they might be null for this reason. Which ones of them are null depends on what are those aforementioned "parser features" that affect namespaces. I don't know can the parameter that is null vary between elements in a namespace and elements without a namespace - I haven't investigated that behaviour.

参数名称空间、qname或localName的内容根据Java spec实际上是可选的,而且由于这个原因,它们可能是空的。它们中的哪些是null取决于前面提到的影响名称空间的“解析器特性”。我不知道null参数在命名空间中的元素和没有命名空间的元素之间是否存在差异——我还没有研究过这种行为。

PS. XML is case sensitive. So ideally you don't need to ignore case in tag name string comparison.
-First post, yay!

XML是区分大小写的。所以理想情况下,您不需要忽略标记名称字符串比较中的大小写。——首先,耶!

#4


0  

Might help someone using the Android SAX util. I was trying geo:lat to get the lat element form the geo namepace.

可能会帮助使用Android SAX util的人。我试着用geo:lat从geo namepace中获取lat元素。

Sample XML:

示例XML:

<item> 
 <title>My Item title</title> 
 <geo:lat>40.720741</geo:lat> 
</item>

First attempt returned null:

第一次尝试返回零:

item.getChild("geo:lat");

As suggested above, I found passing the namespace URI to the getChild method worked.

如上所述,我发现将命名空间URI传递给getChild方法是有效的。

item.getChild("http://www.w3.org/2003/01/geo/wgs84_pos#", "lat");

#5


0  

Using startPrefixMapping method of my xml handler I was able to parse out text of a namespace.

使用xml处理程序的startPrefixMapping方法,我可以解析名称空间的文本。

I placed several calls to this method beneath my handler instantiation.

我在处理程序实例化下面对这个方法进行了几个调用。

GoogleReader xmlhandler = new GoogleReader();
xmlhandler.startPrefixMapping("dc", "http://purl.org/dc/elements/1.1/");

where dc is the namespace <dc:author>some text</dc:author>

名称空间 某文本

#1


1  

An element like <google:image_link> has the local name image_link belonging to the google namespace. You need to ensure that the XML parsing framework is aware of namespaces, and you'd then need to find this element using the appropriate namespace.

像 <谷歌:image_link> 这样的元素具有属于谷歌名称空间的本地名称image_link。您需要确保XML解析框架知道名称空间,然后需要使用适当的名称空间找到该元素。

For example, a few SAX1 interfaces in package org.xml.sax has been deprecated, replaced by SAX2 counterparts that include namespace support (e.g. SAX1 Parser is deprecated and replaced by SAX2 XMLReader). Consult the documentation on how to specify the namespace uri or qualified (prefixed) qName.

例如,在package org.xml中有几个SAX1接口。sax已被弃用,取而代之的是SAX2,其中包含名称空间支持(例如,SAX1解析器被弃用,并被SAX2 XMLReader所取代)。查阅关于如何指定名称空间uri或限定(前缀)qName的文档。

See also

#2


1  

From sample it is not actually clear what namespace that 'google' prefix binds to -- previous answer is slightly incorrect in that it is NOT in "google" namespace; rather, it is a namespace that prefix "google" binds to. As such you have to match the namespace (identified by URI), and not prefix. SAX does have confusing way of reporting local name / namespace-prefix combinations, and it depends on whether namespace processing is even enabled.

从示例中,实际上不清楚“谷歌”前缀绑定到哪个名称空间——之前的答案有点不正确,因为它不在“谷歌”名称空间中;相反,它是前缀“谷歌”绑定到的名称空间。因此,您必须匹配名称空间(由URI标识),而不是前缀。SAX报告本地名称/名称空间前缀组合的方式确实令人困惑,这取决于是否启用名称空间处理。

You could also consider alternative XML processing libraries / APIs; while SAX implementations are performant, there are as fast and more convenient alternatives. Stax (javax.xml.stream.*) implementations like Woodstox (and even default one that JDK 1.6 comes with) are fast and bit more convenient. And StaxMate library that builds on top of Stax is much simpler to use for both reading and writing, and speedwise as fast as SAX implementations like Xerces. Plus Stax API has less baggage wrt namespace handling so it is easier to see what is the actual namespace of elements.

您还可以考虑其他的XML处理库/ api;虽然SAX实现是高性能的,但也有快速和更方便的替代方法。Stax (javax.xml.stream.*)实现像Woodstox(甚至是JDK 1.6附带的默认实现)这样的实现既快又方便。而构建在Stax之上的StaxMate库对于读写都更简单,速度也和SAX实现(如Xerces)一样快。另外,Stax API的行李空间处理更少,所以更容易看到元素的实际名称空间。

#3


0  

Like user polygenelubricants said: generally the parser needs to be namespace aware to handle elements which belong to some prefixed namespace. (Like that <google:image_link> element.)

就像用户polygenelubricants说的那样:通常解析器需要知道名称空间来处理属于某个前缀命名空间的元素。(如, <谷歌:image_link> 元素)。

This needs to be set as a "parser feature" which AFAIK can be done in few different ways: The XMLReader interface itself has method setFeature() that can be used to set features for a certain parser but you can also use same method for SAXParserFactory class so that this factory generates parsers with those features already on or off. SAX2 standard feature flags should be on SAXproject's website but at least some of them are also listed in Java API documentation of package org.xml.sax.

这需要设置为一个“解析器特性”AFAIK可以通过几种不同的方法:XMLReader接口本身setFeature()方法,可用于设置特性一定的解析器,但您还可以使用相同的方法SAXParserFactory类,以便工厂能产生与这些功能已经打开或关闭。SAX2解析器标准特征标志应该SAXproject网站上,但至少他们中的一些人也org.xml.sax Java API文档中列出的包。

For simple documents you can try to take a shortcut. If you don't actually care about namespaces and element names as in a URL + local-name combination, and you can trust that the elements you are looking for (and only these) always have certain prefix and that there aren't elements from other namespaces with same local name then you might just solve your problem by using qname parameter of startElement() method instead of localName or vice versa or by adding/dropping the prefix from the tag name string you compare to.

对于简单的文档,您可以尝试使用快捷方式。如果你实际上并不关心名称空间和元素名称和URL +本地名称组合,你可以相信,你正在寻找的元素(并且只有这些)总是有一定的前缀,没有来自其他名称空间元素相同的本地名称然后你可以解决你的问题通过使用qname startElement()方法的参数,而不是localName反之亦然或添加/删除标记名称的前缀字符串比较。

The contents of parameters namespaceUri, qname or localName is according to Java specs actually optional and AFAIK they might be null for this reason. Which ones of them are null depends on what are those aforementioned "parser features" that affect namespaces. I don't know can the parameter that is null vary between elements in a namespace and elements without a namespace - I haven't investigated that behaviour.

参数名称空间、qname或localName的内容根据Java spec实际上是可选的,而且由于这个原因,它们可能是空的。它们中的哪些是null取决于前面提到的影响名称空间的“解析器特性”。我不知道null参数在命名空间中的元素和没有命名空间的元素之间是否存在差异——我还没有研究过这种行为。

PS. XML is case sensitive. So ideally you don't need to ignore case in tag name string comparison.
-First post, yay!

XML是区分大小写的。所以理想情况下,您不需要忽略标记名称字符串比较中的大小写。——首先,耶!

#4


0  

Might help someone using the Android SAX util. I was trying geo:lat to get the lat element form the geo namepace.

可能会帮助使用Android SAX util的人。我试着用geo:lat从geo namepace中获取lat元素。

Sample XML:

示例XML:

<item> 
 <title>My Item title</title> 
 <geo:lat>40.720741</geo:lat> 
</item>

First attempt returned null:

第一次尝试返回零:

item.getChild("geo:lat");

As suggested above, I found passing the namespace URI to the getChild method worked.

如上所述,我发现将命名空间URI传递给getChild方法是有效的。

item.getChild("http://www.w3.org/2003/01/geo/wgs84_pos#", "lat");

#5


0  

Using startPrefixMapping method of my xml handler I was able to parse out text of a namespace.

使用xml处理程序的startPrefixMapping方法,我可以解析名称空间的文本。

I placed several calls to this method beneath my handler instantiation.

我在处理程序实例化下面对这个方法进行了几个调用。

GoogleReader xmlhandler = new GoogleReader();
xmlhandler.startPrefixMapping("dc", "http://purl.org/dc/elements/1.1/");

where dc is the namespace <dc:author>some text</dc:author>

名称空间 某文本