使用node.js和request提取所有超链接(来自外部网站)

时间:2022-10-29 13:38:21

Right now our app writes the source code of nodejs.org to the console. We'd like it to write all hyperlinks of nodejs.org instead. Maybe we need just one line of code to get the links from body.

现在我们的应用程序将nodejs.org的源代码写入控制台。我们希望它能编写nodejs.org的所有超链接。也许我们只需要一行代码来获取body的链接。

app.js:

app.js:

var http = require('http');

http.createServer(function (req, res) {
    res.writeHead(200, {'Content-Type': 'text/plain'});
    res.end('Hello World\n');
}).listen(1337, '127.0.0.1');
console.log('Server running at http://127.0.0.1:1337/');

var request = require("request");



request("http://nodejs.org/", function (error, response, body) {
    if (!error)
        console.log(body);
    else
        console.log(error);
});

1 个解决方案

#1


36  

You are probably looking for either jsdom , jquery or cheerio. What you are doing is called screen scraping, extracting data from a site. jsdom/jquery offer complete set of tools but cheerio is much faster.

你可能正在寻找jsdom,jquery或cheerio。您正在做的是称为屏幕抓取,从站点提取数据。 jsdom / jquery提供完整的工具集,但cheerio更快。

Here is a cheerio example :

这是一个cheerio例子:

var request = require('request');
var cheerio = require('cheerio');
var searchTerm = 'screen+scraping';
var url = 'http://www.bing.com/search?q=' + searchTerm;
request(url, function(err, resp, body){
  $ = cheerio.load(body);
  links = $('a'); //jquery get all hyperlinks
  $(links).each(function(i, link){
    console.log($(link).text() + ':\n  ' + $(link).attr('href'));
  });
});

You choose whatever is best for you.

你选择最适合自己的东西。

#1


36  

You are probably looking for either jsdom , jquery or cheerio. What you are doing is called screen scraping, extracting data from a site. jsdom/jquery offer complete set of tools but cheerio is much faster.

你可能正在寻找jsdom,jquery或cheerio。您正在做的是称为屏幕抓取,从站点提取数据。 jsdom / jquery提供完整的工具集,但cheerio更快。

Here is a cheerio example :

这是一个cheerio例子:

var request = require('request');
var cheerio = require('cheerio');
var searchTerm = 'screen+scraping';
var url = 'http://www.bing.com/search?q=' + searchTerm;
request(url, function(err, resp, body){
  $ = cheerio.load(body);
  links = $('a'); //jquery get all hyperlinks
  $(links).each(function(i, link){
    console.log($(link).text() + ':\n  ' + $(link).attr('href'));
  });
});

You choose whatever is best for you.

你选择最适合自己的东西。