Excel宏 - 使用Excel数据来抓取HTML页面

时间:2021-08-27 09:52:56

As with many other questions, I'm really new to using Excel macros and HTML.

和许多其他问题一样,我对使用Excel宏和HTML真的很陌生。

I'm building a table with entries of items from Dungeons and Dragons (DnD) that I want to get descriptions of from a particular web page: http://www.5esrd.com/gamemastering/magic-items/ . For example, An entry in my table would be Adamantine Armor, so I would want the macro to search the web page for that listing, scrape the description that's entered on the webpage and save that to the excel document. In this case, that specific description (as per the 5esrd webpage) is as follows:

我正在建立一个表格,其中包含来自龙与地下城(DnD)的项目,我希望从特定网页获取描述:http://www.5esrd.com/gamemastering/magic-items/。例如,我的表中的一个条目是Adamantine Armor,所以我希望宏在网页上搜索该列表,抓取在网页上输入的描述并将其保存到excel文档中。在这种情况下,该具体描述(根据5esrd网页)如下:

Armor (medium or heavy, but not hide), uncommon

护甲(中等或重型,但不隐藏),不常见

This suit of armor is reinforced with adamantine, one of the hardest substances in existence. While you’re wearing it, any critical hit against you becomes a normal hit.

这套盔甲用金刚石加强,金刚石是现存最难的物质之一。当你戴着它时,对你的任何重击都会成为正常现象。

On this page, all the items are links to pages that have them grouped by type (Sword, wand, etc.). I would like to iterate through my row of items in Excel, search for that item on the page and scrape its description into my excel table.

在此页面上,所有项目都是指向按类型(剑,魔杖等)分组的页面的链接。我想在Excel中遍历我的一行项目,在页面上搜索该项目并将其描述刮到我的Excel表格中。

I was following along this page: https://www.wiseowl.co.uk/blog/s393/scrape-website-html.htm, which scrapes data from *'s home page. It directly targets some of the tables that the web page uses to organize its entries. The DnD page I'm trying to use doesn't organize its entries into tables like this, so I'm a little lost with how to proceed.

我正在关注此页面:https://www.wiseowl.co.uk/blog/s393/scrape-website-html.htm,它从*的主页上删除数据。它直接针对网页用于组织其条目的一些表。我正在尝试使用的DnD页面不会将其条目组织到这样的表中,所以我对如何继续进行了一点点迷失。

If anyone could help point me in the right direction, I would be very grateful!

如果有人能帮助我指出正确的方向,我将非常感激!

1 个解决方案

#1


0  

This is the code I use for scraping a web page on our company site. I put all my data in column B, so make adjustments accordingly. This should get you started.

这是我用于在公司网站上抓取网页的代码。我将所有数据都放在B列中,因此请进行相应的调整。这应该让你开始。

Sub TestScrape()
'   SCRAPE A WEB PAGE
Dim ieDoc As New HTMLDocument 'ieDocDocument
Dim tdCollection As Object 'table that has the javascript attributes and contains the element I want to click
Dim AnchorLinks As Object
Dim tdElements As Object
Dim tdElement As Object
Dim AnchorLink As Object
Dim lRow As Long
Dim ie As InternetExplorer
Dim cls As IHTMLElementCollection

Set ie = New SHDocVw.InternetExplorer
 With ie
   .Visible = True
   .Left = 0
   .TheaterMode = True   '<<-- Comment this out if you don't want Theater Mode
   .Navigate url:="Enter your URL here"

   While .ReadyState < 4 Or .Busy:  DoEvents:  Wend

 End With

        Application.Wait Now + TimeSerial(0, 0, 4)


ieDoc.body.innerHTML = ie.Document.body.innerHTML

With ieDoc.body
    Set AnchorLinks = .getElementsByTagName("table")
    Set tdElements = .getElementsByTagName("td") '

    For Each AnchorLink In AnchorLinks
        Debug.Print AnchorLink.innerText
    Next AnchorLink

End With

lRow = 1
   For Each tdElement In tdElements
      Debug.Print tdElement.innerText
      Cells(lRow, 2).Value = tdElement.innerText
      lRow = lRow + 1
   Next
     Application.Wait (Now + TimeValue("0:00:1"))
 With ie
  .TheaterMode = False
  .Quit
 End With
Set ie = Nothing

End Sub

#1


0  

This is the code I use for scraping a web page on our company site. I put all my data in column B, so make adjustments accordingly. This should get you started.

这是我用于在公司网站上抓取网页的代码。我将所有数据都放在B列中,因此请进行相应的调整。这应该让你开始。

Sub TestScrape()
'   SCRAPE A WEB PAGE
Dim ieDoc As New HTMLDocument 'ieDocDocument
Dim tdCollection As Object 'table that has the javascript attributes and contains the element I want to click
Dim AnchorLinks As Object
Dim tdElements As Object
Dim tdElement As Object
Dim AnchorLink As Object
Dim lRow As Long
Dim ie As InternetExplorer
Dim cls As IHTMLElementCollection

Set ie = New SHDocVw.InternetExplorer
 With ie
   .Visible = True
   .Left = 0
   .TheaterMode = True   '<<-- Comment this out if you don't want Theater Mode
   .Navigate url:="Enter your URL here"

   While .ReadyState < 4 Or .Busy:  DoEvents:  Wend

 End With

        Application.Wait Now + TimeSerial(0, 0, 4)


ieDoc.body.innerHTML = ie.Document.body.innerHTML

With ieDoc.body
    Set AnchorLinks = .getElementsByTagName("table")
    Set tdElements = .getElementsByTagName("td") '

    For Each AnchorLink In AnchorLinks
        Debug.Print AnchorLink.innerText
    Next AnchorLink

End With

lRow = 1
   For Each tdElement In tdElements
      Debug.Print tdElement.innerText
      Cells(lRow, 2).Value = tdElement.innerText
      lRow = lRow + 1
   Next
     Application.Wait (Now + TimeValue("0:00:1"))
 With ie
  .TheaterMode = False
  .Quit
 End With
Set ie = Nothing

End Sub