Python:BeautifulSoup - 根据name属性获取属性值

时间:2022-11-27 15:36:14

I want to print an attribute value based on its name, take for example

我想根据其名称打印属性值,例如

<META NAME="City" content="Austin">

I want to do something like this

我想做这样的事情

soup = BeautifulSoup(f) //f is some HTML containing the above meta tag
for meta_tag in soup('meta'):
    if meta_tag['name'] == 'City':
         print meta_tag['content']

The above code give a KeyError: 'name', I believe this is because name is used by BeatifulSoup so it can't be used as a keyword argument.

上面的代码给出了一个KeyError:'name',我相信这是因为BeatifulSoup使用了name,所以它不能用作关键字参数。

6 个解决方案

#1


94  

It's pretty simple, use the following -

这很简单,使用以下 -

>>> soup = BeautifulSoup('<META NAME="City" content="Austin">')
>>> soup.find("meta", {"name":"City"})
<meta name="City" content="Austin" />
>>> soup.find("meta", {"name":"City"})['content']
u'Austin'

Leave a comment if anything is not clear.

如果有任何不清楚的地方发表评论。

#2


14  

theharshest answered the question but here is another way to do the same thing. Also, In your example you have NAME in caps and in your code you have name in lowercase.

theharshest回答了这个问题,但这是另一种做同样事情的方法。此外,在您的示例中,您有大写的NAME,在您的代码中,您有小写的名称。

s = '<div class="question" id="get attrs" name="python" x="something">Hello World</div>'
soup = BeautifulSoup(s)

attributes_dictionary = soup.find('div').attrs
print attributes_dictionary
# prints: {'id': 'get attrs', 'x': 'something', 'class': ['question'], 'name': 'python'}

print attributes_dictionary['class'][0]
# prints: question

print soup.find('div').get_text()
# prints: Hello World

#3


6  

theharshest's answer is the best solution, but FYI the problem you were encountering has to do with the fact that a Tag object in Beautiful Soup acts like a Python dictionary. If you access tag['name'] on a tag that doesn't have a 'name' attribute, you'll get a KeyError.

theharshest的答案是最好的解决方案,但是你遇到的问题仅与美丽汤中的Tag对象就像Python字典这样的事实有关。如果您在没有'name'属性的标记*问标记['name'],您将获得KeyError。

#4


3  

The following works:

以下作品:

from bs4 import BeautifulSoup

soup = BeautifulSoup('<META NAME="City" content="Austin">', 'html.parser')

metas = soup.find_all("meta")

for meta in metas:
    print meta.attrs['content'], meta.attrs['name']

#5


2  

6 years late to the party but I've been searching for how to extract an html element's tag attribute value, so for:

晚了6年,但我一直在寻找如何提取html元素的标签属性值,所以对于:

<span property="addressLocality">Ayr</span>

I want "addressLocality". I kept being directed back here, but the answers didn't really solve my problem.

我想要“addressLocality”。我一直被带回这里,但答案并没有真正解决我的问题。

How I managed to do it eventually:

我最终是如何做到的:

>>> from bs4 import BeautifulSoup as bs

>>> soup = bs('<span property="addressLocality">Ayr</span>', 'html.parser')
>>> my_attributes = soup.find().attrs
>>> my_attributes
{u'property': u'addressLocality'}

As it's a dict, you can then also use keys and 'values'

因为它是一个字典,你也可以使用键和'值'

>>> my_attributes.keys()
[u'property']
>>> my_attributes.values()
[u'addressLocality']

Hopefully it helps someone else!

希望它可以帮助别人!

#6


0  

One can also try this solution :

也可以试试这个解决方案:

To find the value, which is written in span of table

要查找以表的范围写的值

htmlContent

htmlContent


<table>
    <tr>
        <th>
            ID
        </th>
        <th>
            Name
        </th>
    </tr>


    <tr>
        <td>
            <span name="spanId" class="spanclass">ID123</span>
        </td>

        <td>
            <span>Bonny</span>
        </td>
    </tr>
</table>

Python code

Python代码


soup = BeautifulSoup(htmlContent, "lxml")
soup.prettify()

tables = soup.find_all("table")

for table in tables:
   storeValueRows = table.find_all("tr")
   thValue = storeValueRows[0].find_all("th")[0].string

   if (thValue == "ID"): # with this condition I am verifying that this html is correct, that I wanted.
      value = storeValueRows[1].find_all("span")[0].string
      value = value.strip()

      # storeValueRows[1] will represent <tr> tag of table located at first index and find_all("span")[0] will give me <span> tag and '.string' will give me value

      # value.strip() - will remove space from start and end of the string.

     # find using attribute :

     value = storeValueRows[1].find("span", {"name":"spanId"})['class']
     print value
     # this will print spanclass

#1


94  

It's pretty simple, use the following -

这很简单,使用以下 -

>>> soup = BeautifulSoup('<META NAME="City" content="Austin">')
>>> soup.find("meta", {"name":"City"})
<meta name="City" content="Austin" />
>>> soup.find("meta", {"name":"City"})['content']
u'Austin'

Leave a comment if anything is not clear.

如果有任何不清楚的地方发表评论。

#2


14  

theharshest answered the question but here is another way to do the same thing. Also, In your example you have NAME in caps and in your code you have name in lowercase.

theharshest回答了这个问题,但这是另一种做同样事情的方法。此外,在您的示例中,您有大写的NAME,在您的代码中,您有小写的名称。

s = '<div class="question" id="get attrs" name="python" x="something">Hello World</div>'
soup = BeautifulSoup(s)

attributes_dictionary = soup.find('div').attrs
print attributes_dictionary
# prints: {'id': 'get attrs', 'x': 'something', 'class': ['question'], 'name': 'python'}

print attributes_dictionary['class'][0]
# prints: question

print soup.find('div').get_text()
# prints: Hello World

#3


6  

theharshest's answer is the best solution, but FYI the problem you were encountering has to do with the fact that a Tag object in Beautiful Soup acts like a Python dictionary. If you access tag['name'] on a tag that doesn't have a 'name' attribute, you'll get a KeyError.

theharshest的答案是最好的解决方案,但是你遇到的问题仅与美丽汤中的Tag对象就像Python字典这样的事实有关。如果您在没有'name'属性的标记*问标记['name'],您将获得KeyError。

#4


3  

The following works:

以下作品:

from bs4 import BeautifulSoup

soup = BeautifulSoup('<META NAME="City" content="Austin">', 'html.parser')

metas = soup.find_all("meta")

for meta in metas:
    print meta.attrs['content'], meta.attrs['name']

#5


2  

6 years late to the party but I've been searching for how to extract an html element's tag attribute value, so for:

晚了6年,但我一直在寻找如何提取html元素的标签属性值,所以对于:

<span property="addressLocality">Ayr</span>

I want "addressLocality". I kept being directed back here, but the answers didn't really solve my problem.

我想要“addressLocality”。我一直被带回这里,但答案并没有真正解决我的问题。

How I managed to do it eventually:

我最终是如何做到的:

>>> from bs4 import BeautifulSoup as bs

>>> soup = bs('<span property="addressLocality">Ayr</span>', 'html.parser')
>>> my_attributes = soup.find().attrs
>>> my_attributes
{u'property': u'addressLocality'}

As it's a dict, you can then also use keys and 'values'

因为它是一个字典,你也可以使用键和'值'

>>> my_attributes.keys()
[u'property']
>>> my_attributes.values()
[u'addressLocality']

Hopefully it helps someone else!

希望它可以帮助别人!

#6


0  

One can also try this solution :

也可以试试这个解决方案:

To find the value, which is written in span of table

要查找以表的范围写的值

htmlContent

htmlContent


<table>
    <tr>
        <th>
            ID
        </th>
        <th>
            Name
        </th>
    </tr>


    <tr>
        <td>
            <span name="spanId" class="spanclass">ID123</span>
        </td>

        <td>
            <span>Bonny</span>
        </td>
    </tr>
</table>

Python code

Python代码


soup = BeautifulSoup(htmlContent, "lxml")
soup.prettify()

tables = soup.find_all("table")

for table in tables:
   storeValueRows = table.find_all("tr")
   thValue = storeValueRows[0].find_all("th")[0].string

   if (thValue == "ID"): # with this condition I am verifying that this html is correct, that I wanted.
      value = storeValueRows[1].find_all("span")[0].string
      value = value.strip()

      # storeValueRows[1] will represent <tr> tag of table located at first index and find_all("span")[0] will give me <span> tag and '.string' will give me value

      # value.strip() - will remove space from start and end of the string.

     # find using attribute :

     value = storeValueRows[1].find("span", {"name":"spanId"})['class']
     print value
     # this will print spanclass