如何使用漂亮的汤找到节点的子节点

时间:2022-11-29 12:05:36

I want to get all the <a> tags which are children of <li>

我想要得到所有

  • 的孩子们的>标签。

  • 的孩子们的>标签。
  • <div>
    <li class="test">
        <a>link1</a>
        <ul> 
           <li>  
              <a>link2</a> 
           </li>
        </ul>
    </li>
    </div>
    

    I know how to find element with particular class like this

    我知道如何找到像这样特殊类的元素

    soup.find("li", { "class" : "test" }) 
    

    But i don't know how to find all a which are children of <li class=test> but not any others

    但我不知道如何找到

  • 的所有a,但不知道其他的

  • 的所有,但不知道其他的
  • like i want to select

    就像我想要选择的

    <a> link1 </a>
    

    6 个解决方案

    #1


    49  

    Try this

    试试这个

    li = soup.find('li', {'class': 'text'})
    children = li.findChildren()
    for child in children:
        print child
    

    #2


    77  

    Theres a super small section in the DOCs that shows how to find/find_all direct children.

    文档中有一个超小的部分,展示了如何找到/find_all直接子节点。

    http://www.crummy.com/software/BeautifulSoup/bs4/doc/#the-recursive-argument

    http://www.crummy.com/software/BeautifulSoup/bs4/doc/ the-recursive-argument

    in your case:

    在你的例子:

    soup.find("li", { "class" : "test" },recursive=False)
    soup.find_all("li", { "class" : "test" },recursive=False)
    

    #3


    11  

    try this:

    试试这个:

    li = soup.find("li", { "class" : "test" })
    children = li.find_all("a") # returns a list of all <a> children of li
    

    other reminders:

    其他提示:

    The find method only gets the first occurring child element. The find_all method gets all descendant elements and are stored in a list.

    find方法只获取第一个出现的子元素。find_all方法获取所有子代元素并存储在列表中。

    #4


    8  

    Perhaps you want to do

    也许你想做

    soup.find("li", { "class" : "test" }).find('a')
    

    #5


    4  

    Yet another method - create a filter function that returns True for all desired tags:

    另一种方法——创建一个过滤器函数,返回所有需要的标签为真:

    def my_filter(tag):
        return (tag.name == 'a' and
            tag.parent.name == 'li' and
            'test' in tag.parent['class'])
    

    Then just call find_all with the argument:

    然后用参数调用find_all:

    for a in soup(my_filter): # or soup.find_all(my_filter)
        print a
    

    #6


    1  

    "How to find all a which are children of <li class=test> but not any others?"

    “如何找到

  • 的所有a,而不是其他的?”

  • 的所有,而不是其他的?”
  • Given the HTML below (I added another <a> to show te difference between select and select_one):

    下面的HTML(我添加了另一个< >,以显示select和select_one之间的区别):

    <div>
      <li class="test">
        <a>link1</a>
        <ul>
          <li>
            <a>link2</a>
          </li>
        </ul>
        <a>link3</a>
      </li>
    </div>
    

    The solution is to use child combinator (>) that is placed between two CSS selectors:

    解决方案是使用子组合器(>),它位于两个CSS选择器之间:

    >>> soup.select('li.test > a')
    [<a>link1</a>, <a>link3</a>]
    

    In case you want to find only the first child:

    如果你只想找到第一个孩子:

    >>> soup.select_one('li.test > a')
    <a>link1</a>
    

    #1


    49  

    Try this

    试试这个

    li = soup.find('li', {'class': 'text'})
    children = li.findChildren()
    for child in children:
        print child
    

    #2


    77  

    Theres a super small section in the DOCs that shows how to find/find_all direct children.

    文档中有一个超小的部分,展示了如何找到/find_all直接子节点。

    http://www.crummy.com/software/BeautifulSoup/bs4/doc/#the-recursive-argument

    http://www.crummy.com/software/BeautifulSoup/bs4/doc/ the-recursive-argument

    in your case:

    在你的例子:

    soup.find("li", { "class" : "test" },recursive=False)
    soup.find_all("li", { "class" : "test" },recursive=False)
    

    #3


    11  

    try this:

    试试这个:

    li = soup.find("li", { "class" : "test" })
    children = li.find_all("a") # returns a list of all <a> children of li
    

    other reminders:

    其他提示:

    The find method only gets the first occurring child element. The find_all method gets all descendant elements and are stored in a list.

    find方法只获取第一个出现的子元素。find_all方法获取所有子代元素并存储在列表中。

    #4


    8  

    Perhaps you want to do

    也许你想做

    soup.find("li", { "class" : "test" }).find('a')
    

    #5


    4  

    Yet another method - create a filter function that returns True for all desired tags:

    另一种方法——创建一个过滤器函数,返回所有需要的标签为真:

    def my_filter(tag):
        return (tag.name == 'a' and
            tag.parent.name == 'li' and
            'test' in tag.parent['class'])
    

    Then just call find_all with the argument:

    然后用参数调用find_all:

    for a in soup(my_filter): # or soup.find_all(my_filter)
        print a
    

    #6


    1  

    "How to find all a which are children of <li class=test> but not any others?"

    “如何找到

  • 的所有a,而不是其他的?”

  • 的所有,而不是其他的?”
  • Given the HTML below (I added another <a> to show te difference between select and select_one):

    下面的HTML(我添加了另一个< >,以显示select和select_one之间的区别):

    <div>
      <li class="test">
        <a>link1</a>
        <ul>
          <li>
            <a>link2</a>
          </li>
        </ul>
        <a>link3</a>
      </li>
    </div>
    

    The solution is to use child combinator (>) that is placed between two CSS selectors:

    解决方案是使用子组合器(>),它位于两个CSS选择器之间:

    >>> soup.select('li.test > a')
    [<a>link1</a>, <a>link3</a>]
    

    In case you want to find only the first child:

    如果你只想找到第一个孩子:

    >>> soup.select_one('li.test > a')
    <a>link1</a>