BeautifulSoup问题:如何通过匹配确切的标签内容获得确切的链接?

时间:2022-10-27 08:15:51

I want to get the link that is after "S-1", instead of the one after "S-1/A". I tried ".find_all(lambda tag: tag.name == 'td' and tag.get()==['S-1'])", tried ".select('td.s-1')", and failed to get the link. I appreciate any help on it.

我想获得“S-1”之后的链接,而不是“S-1 / A”之后的链接。我试过“.find_all(lambda tag:tag.name =='td'和tag.get()== ['S-1'])”,试过“.select('td.s-1')”,并没有得到链接。我感谢任何帮助。

Here is the relevant page source:

以下是相关的页面来源:

    <tr>
        <td>ADVANCE FINANCIAL BANCORP</td>
        <td>S-1/A</td>
        <td>10/31/1996</td>
        <td><a id="two_column_main_content_rpt_filings_fil_view_0" href="/markets/ipos/filing.ashx?filingid=1567309" target="_blank">Filing</a>
        </td>
    </tr>

    <tr>
        <td>ADVANCE FINANCIAL BANCORP</td>
        <td>S-1</td>
        <td>9/27/1996</td>
        <td><a id="two_column_main_content_rpt_filings_fil_view_1" href="/markets/ipos/filing.ashx?filingid=921318" target="_blank">Filing</a>
        </td>
    </tr>

Here is the screenshot of relevant page source:

以下是相关页面源的屏幕截图:

BeautifulSoup问题:如何通过匹配确切的标签内容获得确切的链接?

Here is the link of the full page source:

以下是整页源代码的链接:

https://www.nasdaq.com/markets/ipos/company/advance-financial-bancorp-5492-13046?tab=financials

1 个解决方案

#1


1  

Try this:

from bs4 import BeautifulSoup
import requests    

def getlink(url):
    response = requests.get(url)
    mainpage = BeautifulSoup(response.text, 'html5lib')
    table = mainpage.findAll('table', attrs={"class": "marginB10px"})
    links = table[1].findAll('a')
    return links[1].get('href')    

link = getlink('https://www.nasdaq.com/markets/ipos/company/advance-financial-bancorp-5492-13046?tab=financials')
mainlink = 'https://www.nasdaq.com'
link = mainlink + link
print(link)

output:

https://www.nasdaq.com/markets/ipos/filing.ashx?filingid=921318

#1


1  

Try this:

from bs4 import BeautifulSoup
import requests    

def getlink(url):
    response = requests.get(url)
    mainpage = BeautifulSoup(response.text, 'html5lib')
    table = mainpage.findAll('table', attrs={"class": "marginB10px"})
    links = table[1].findAll('a')
    return links[1].get('href')    

link = getlink('https://www.nasdaq.com/markets/ipos/company/advance-financial-bancorp-5492-13046?tab=financials')
mainlink = 'https://www.nasdaq.com'
link = mainlink + link
print(link)

output:

https://www.nasdaq.com/markets/ipos/filing.ashx?filingid=921318