如何将HTML可搜索查询数据自动化并导出到excel中

时间:2023-01-15 13:22:57

A webpage I am interested in extracting data from has a table with multiple search fields. I can enter data into any of these fields and click the search button at the bottom of the table and see the results based on the information I wanted to search for.

我有兴趣从中提取数据的网页有一个包含多个搜索字段的表。我可以在任何这些字段中输入数据,然后单击表格底部的搜索按钮,根据我要搜索的信息查看结果。

I have multiple numbers i want to search for (around 300), instead of searching each of these individually, is there a way to automate searching the data and import the data into an excel worksheet for each number I want to search?

我想要搜索多个数字(大约300个),而不是单独搜索每个数字,有没有办法自动搜索数据并将数据导入到我想要搜索的每个数字的Excel工作表中?

is it possible using an Excel macro?

是否可以使用Excel宏?

1 个解决方案

#1


1  

You can use the MSXML and MSHTML libraries for this. This code should get you started.
Start by running this sub to add both references (you only need to run this once):

您可以使用MSXML和MSHTML库。这段代码可以帮助您入门。首先运行此子程序以添加两个引用(您只需运行一次):

Sub addReferences()
    ActiveWorkbook.VBProject.References.AddFromGuid "{3050F1C5-98B5-11CF-BB82-00AA00BDCE0B}", 4, 0
    ActiveWorkbook.VBProject.References.AddFromGuid "{F5078F18-C551-11D3-89B9-0000F81FE221}", 6, 0
End Sub

Then edit the getCAGEValues sub to import your CAGE codes and save the resulting data (and any additional data you want from the page):

然后编辑getCAGEValues子以导入CAGE代码并保存结果数据(以及您希望从页面获得的任何其他数据):

Sub getCAGEValues()
    Dim oHTMLDoc As MSHTML.HTMLDocument
    Dim oSpan As MSHTML.HTMLGenericElement
    Dim CAGECodes() As Variant
    CAGECodes = Array("12345", "12346") 'CAGECodes is an array of your codes'
    For Each CAGECode In CAGECodes
        Set oHTMLDoc = getPage(CAGECode)
        Set oSpan = oHTMLDoc.getElementById("ctl00_cphMainPageBody_lblCompNameData") 'The id for the company name'
        MsgBox oSpan.innerText 'Save the value however you want to.'
    Next
End Sub

Function getPage(CAGECode As Variant) As MSHTML.HTMLDocument
    Dim oHttpRequest As MSXML2.XMLHTTP60
    Set oHttpRequest = New MSXML2.XMLHTTP60
    With oHttpRequest
        .Open "GET", "http://www.logisticsinformationservice.dla.mil/BINCS/details.aspx?CAGE=" & CAGECode, False
        .setRequestHeader "Cache-Control", "no-cache"
        .setRequestHeader "Pragma", "no-cache"
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        .send
    End With
    Dim oHTMLDoc As MSHTML.HTMLDocument
    Set oHTMLDoc = New MSHTML.HTMLDocument
    oHTMLDoc.body.innerHTML = oHttpRequest.responseText
    Set getPage = oHTMLDoc
End Function

#1


1  

You can use the MSXML and MSHTML libraries for this. This code should get you started.
Start by running this sub to add both references (you only need to run this once):

您可以使用MSXML和MSHTML库。这段代码可以帮助您入门。首先运行此子程序以添加两个引用(您只需运行一次):

Sub addReferences()
    ActiveWorkbook.VBProject.References.AddFromGuid "{3050F1C5-98B5-11CF-BB82-00AA00BDCE0B}", 4, 0
    ActiveWorkbook.VBProject.References.AddFromGuid "{F5078F18-C551-11D3-89B9-0000F81FE221}", 6, 0
End Sub

Then edit the getCAGEValues sub to import your CAGE codes and save the resulting data (and any additional data you want from the page):

然后编辑getCAGEValues子以导入CAGE代码并保存结果数据(以及您希望从页面获得的任何其他数据):

Sub getCAGEValues()
    Dim oHTMLDoc As MSHTML.HTMLDocument
    Dim oSpan As MSHTML.HTMLGenericElement
    Dim CAGECodes() As Variant
    CAGECodes = Array("12345", "12346") 'CAGECodes is an array of your codes'
    For Each CAGECode In CAGECodes
        Set oHTMLDoc = getPage(CAGECode)
        Set oSpan = oHTMLDoc.getElementById("ctl00_cphMainPageBody_lblCompNameData") 'The id for the company name'
        MsgBox oSpan.innerText 'Save the value however you want to.'
    Next
End Sub

Function getPage(CAGECode As Variant) As MSHTML.HTMLDocument
    Dim oHttpRequest As MSXML2.XMLHTTP60
    Set oHttpRequest = New MSXML2.XMLHTTP60
    With oHttpRequest
        .Open "GET", "http://www.logisticsinformationservice.dla.mil/BINCS/details.aspx?CAGE=" & CAGECode, False
        .setRequestHeader "Cache-Control", "no-cache"
        .setRequestHeader "Pragma", "no-cache"
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        .send
    End With
    Dim oHTMLDoc As MSHTML.HTMLDocument
    Set oHTMLDoc = New MSHTML.HTMLDocument
    oHTMLDoc.body.innerHTML = oHttpRequest.responseText
    Set getPage = oHTMLDoc
End Function