使用Universal Feed Parser读取RSS源中的扩展元素集合

Is there any way to read a collection of extension elements with Universal Feed Parser?

有没有办法用Universal Feed Parser阅读扩展元素集合?

This is just a short snippet from Kuler RSS feed:

这只是Kuler RSS feed的简短片段:

<channel>
  <item>
    <!-- snip: regular RSS elements -->
    <kuler:themeItem>
      <kuler:themeID>123456</kuler:themeID>
      <!-- snip -->
      <kuler:themeSwatches>
        <kuler:swatch>
          <kuler:swatchHexColor>FFFFFF</kuler:swatchHexColor>
          <!-- snip -->
        </kuler:swatch>
        <kuler:swatch>
          <kuler:swatchHexColor>000000</kuler:swatchHexColor>
          <!-- snip -->
        </kuler:swatch>
      </kuler:themeSwatches>
    </kuler:themeItem>
  </item>
</channel>

I tried the following:

我尝试了以下方法:

>>> feed = feedparser.parse(url)
>>> feed.channel.title
u'kuler highest rated themes'
>>> feed.entries[0].title
u'Foobar'
>>> feed.entries[0].kuler_themeid
u'123456'
>>> feed.entries[0].kuler_swatch
u''

feed.entries[0].kuler_swatchhexcolor returns only last kuler:swatchHexColor. Is there any way to retrieve all elements with feedparser?

feed.entries [0] .kuler_swatchhexcolor仅返回最后一个kuler:swatchHexColor。有没有办法用feedparser检索所有元素?

I have already worked around the issue by using minidom, but I would like to use Universal Feed Parser if possible (due to very simple API). Can it be extended? I haven't found anything about that in the documentation, so if someone has more experience with the library, please, advise me.

我已经使用minidom解决了这个问题,但是如果可能的话我想使用Universal Feed Parser(由于非常简单的API)。可以延长吗?我在文档中没有找到任何相关信息,所以如果有人对图书馆有更多的经验,请告诉我。

1 个解决方案

#1

Universal Feed Parser is really nice for most feeds, but for extended feeds, you might wanna try something called BeautifulSoup. It's an XML/HTML/XHTML parsing library which is originally designed for screenscraping; turns out it's also brilliant for this sort of thing. The documentation is pretty good, and it's got a self-explanatory API, so if you're thinking of using anything else, that's what I'd recommend.

Universal Feed Parser非常适合大多数Feed,但对于扩展Feed,你可能想尝试一下名为BeautifulSoup的东西。它是一个XML / HTML / XHTML解析库,最初是为屏幕抓取而设计的;事实证明,这种事情也很棒。文档非常好,而且它有一个不言自明的API,所以如果你想要使用其他任何东西,那就是我推荐的。

I'd probably use it like this:

我可能会这样使用它:

>>> import BeautifulSoup
>>> import urllib2

# Fetch HTML data from url
>>> connection = urllib2.urlopen('http://kuler.adobe.com/path/to/rss.xml')
>>> html_data = connection.read()
>>> connection.close()

# Create and search the soup
>>> soup = BeautifulSoup.BeautifulSoup(html_data)
>>> themes = soup.findAll('kuler:themeitem') # Note: all lower-case element names

# Get the ID of the first theme
>>> themes[0].find('kuler:themeid').contents[0]
u'123456'

# Get an ordered list of the hex colors for the first theme
>>> themeswatches = themes[0].find('kuler:themeswatches')
>>> colors = [color.contents[0] for color in
... themeswatches.findAll('kuler:swatchhexcolor')]
>>> colors
[u'FFFFFF', u'000000']

So you can probably get the idea that this is a very cool library. It wouldn't be too good if you were parsing any old RSS feed, but because the data is from Adobe Kuler, you can be pretty sure that it's not going to vary enough to break your app (i.e. it's a trusted enough source).

所以你可能会认为这是一个非常酷的库。如果您正在解析任何旧的RSS源,这不会太好,但由于数据来自Adobe Kuler,您可以非常确定它的变化不足以破坏您的应用程序(即它是一个值得信赖的源)。

Even worse is trying to parse Adobe's goddamn .ASE format. I tried writing a parser for it and it got really horrible, really quickly. Ug. So, yeah, the RSS feeds are probably the easiest way of interfacing with Kuler.

更糟糕的是试图解析Adobe的该死的.ASE格式。我尝试为它编写一个解析器,它非常可怕,非常快。微克。所以,是的,RSS源可能是与Kuler接口的最简单方式。

#1

I'd probably use it like this:

我可能会这样使用它:

>>> import BeautifulSoup
>>> import urllib2

# Fetch HTML data from url
>>> connection = urllib2.urlopen('http://kuler.adobe.com/path/to/rss.xml')
>>> html_data = connection.read()
>>> connection.close()

# Create and search the soup
>>> soup = BeautifulSoup.BeautifulSoup(html_data)
>>> themes = soup.findAll('kuler:themeitem') # Note: all lower-case element names

# Get the ID of the first theme
>>> themes[0].find('kuler:themeid').contents[0]
u'123456'

# Get an ordered list of the hex colors for the first theme
>>> themeswatches = themes[0].find('kuler:themeswatches')
>>> colors = [color.contents[0] for color in
... themeswatches.findAll('kuler:swatchhexcolor')]
>>> colors
[u'FFFFFF', u'000000']

更糟糕的是试图解析Adobe的该死的.ASE格式。我尝试为它编写一个解析器,它非常可怕,非常快。微克。所以,是的,RSS源可能是与Kuler接口的最简单方式。

秒客网

使用Universal Feed Parser读取RSS源中的扩展元素集合

1 个解决方案

#1

#1

相关文章