如何获取HTTPS网页的内容?

时间:2022-06-07 09:40:36

I want to get the content of a webpage by running javascript code on NodeJs . I want the content to be exactly the same as what I see in the browser.

我想通过在NodeJ上运行javascript代码来获取网页的内容。我希望内容与我在浏览器中看到的内容完全相同。

This is the URL : https://www.realtor.ca/Residential/Single-Family/17219235/2103-1185-THE-HIGH-STREET-Coquitlam-British-Columbia-V3B0A9

这是URL:https://www.realtor.ca/Residential/Single-Family/17219235/2103-1185-THE-HIGH-STREET-Coquitlam-British-Columbia-V3B0A9

I use the following code but I get 405 in response.

我使用以下代码,但我得到405响应。

var fs = require('fs');
var link = 'https://www.realtor.ca/Residential/Single-Family/17219235/2103-1185-THE-HIGH-STREET-Coquitlam-British-Columbia-V3B0A9';
var request = require('request');
request(link, function (error, response, body) {
    fs.writeFile("realestatedata.html", body, function(err) {
        if(err) {
            console.log('error in saving the file');
            return console.log(err);
        }
        console.log("The file was saved!");
    });
})

The file which is saved is not related to what I can see in the browser.

保存的文件与我在浏览器中看到的内容无关。

1 个解决方案

#1


0  

I think a real answer will be easier to understand since my comment was truncated.

我认为真正的答案会更容易理解,因为我的评论被截断了。

It seems the method of the request you send is not supported by the server (405 Method Not Allowed - The method specified in the Request-Line is not allowed for the resource identified by the Request-URI. The response MUST include an Allow header containing a list of valid methods for the requested resource.). Do you have more information about the HTTP response. Have you tried the following code instead of yours ?

似乎服务器不支持您发送的请求的方法(405 Method Not Allowed - 请求行中指定的方法不允许由Request-URI标识的资源。响应必须包含Allow头包含所请求资源的有效方法列表。您是否有关于HTTP响应的更多信息?您是否尝试过以下代码而不是您的代码?

request('https://www.realtor.ca/Residential/Single-Family/17219235/2103-1185-THE-HIGH-STREET-Coquitlam-British-Columbia-V3B0A9').pipe(fs.createWriteStream('realestatedata.html')) 

You could also have a look at In Node.js / Express, how do I "download" a page and gets its HTML?.

您还可以查看In Node.js / Express,如何“下载”页面并获取其HTML?

Note that anyway the page will not render the same way when you only open the html since it also requires many other resources (110 requests are done when display the page). I think the following answer can help you to download the whole page. https://*.com/a/34935427/1630604

请注意,无论如何,当您只打开html时页面将不会以相同的方式呈现,因为它还需要许多其他资源(显示页面时会完成110个请求)。我认为以下答案可以帮助您下​​载整个页面。 https://*.com/a/34935427/1630604

#1


0  

I think a real answer will be easier to understand since my comment was truncated.

我认为真正的答案会更容易理解,因为我的评论被截断了。

It seems the method of the request you send is not supported by the server (405 Method Not Allowed - The method specified in the Request-Line is not allowed for the resource identified by the Request-URI. The response MUST include an Allow header containing a list of valid methods for the requested resource.). Do you have more information about the HTTP response. Have you tried the following code instead of yours ?

似乎服务器不支持您发送的请求的方法(405 Method Not Allowed - 请求行中指定的方法不允许由Request-URI标识的资源。响应必须包含Allow头包含所请求资源的有效方法列表。您是否有关于HTTP响应的更多信息?您是否尝试过以下代码而不是您的代码?

request('https://www.realtor.ca/Residential/Single-Family/17219235/2103-1185-THE-HIGH-STREET-Coquitlam-British-Columbia-V3B0A9').pipe(fs.createWriteStream('realestatedata.html')) 

You could also have a look at In Node.js / Express, how do I "download" a page and gets its HTML?.

您还可以查看In Node.js / Express,如何“下载”页面并获取其HTML?

Note that anyway the page will not render the same way when you only open the html since it also requires many other resources (110 requests are done when display the page). I think the following answer can help you to download the whole page. https://*.com/a/34935427/1630604

请注意,无论如何,当您只打开html时页面将不会以相同的方式呈现,因为它还需要许多其他资源(显示页面时会完成110个请求)。我认为以下答案可以帮助您下​​载整个页面。 https://*.com/a/34935427/1630604