如何在shell变量中获取网页的内容?

时间:2021-07-08 11:29:00

In Linux how can I fetch an URL and get its contents in a variable in shell script?

在Linux中,如何获取URL并在shell脚本中的变量中获取其内容?

6 个解决方案

#1


138  

You can use wget command to download the page and read it into a variable as:

您可以使用wget命令下载该页面并将其读入一个变量:

content=$(wget google.com -q -O -)
echo $content

We use the -O option of wget which allows us to specify the name of the file into which wget dumps the page contents. We specify - to get the dump onto standard output and collect that into the variable content. You can add the -q quiet option to turn off's wget output.

我们使用wget的-O选项,它允许我们指定要转储页面内容的文件的名称。我们指定-将转储文件放到标准输出中,并将其收集到变量内容中。您可以添加-q安静选项来关闭wget输出。

You can use the curl command for this aswell as:

您可以使用curl命令来处理这个问题:

content=$(curl -L google.com)
echo $content

We need to use the -L option as the page we are requesting might have moved. In which case we need to get the page from the new location. The -L or --location option helps us with this.

我们需要使用-L选项,因为我们请求的页面可能已经移动了。在这种情况下,我们需要从新的位置获取页面。-L或-location选项帮助我们解决这个问题。

#2


20  

there is many way to get a page in command line... but it also depends if you want the code source or the page itself:

在命令行中有很多方法可以获得一个页面……但它也取决于你是否想要源代码或页面本身:

If you need the code source

如果您需要源代码。

with curl: curl $url

curl:curl $ url

with wget: wget -O - $url

wget: wget - o - $url。

but if you want to get what you can see with a browser, lynx can be usefull: lynx -dump $url

但是如果你想要用浏览器看到的东西,lynx可以使用:lynx -dump $url。

I think you can find so many solutions for this little problem, maybe you should read all man page for those commands. And don't forget to replace $url by your url :)

我想你能找到这么多解决这个小问题的方法,也许你应该读一下所有的手册。别忘了用url替换$url:

Good luck :)

祝你好运:)

#3


9  

There is the wget command or the curl.

有wget命令或curl。

You can now use the file you downloaded with wget. Or you can handle a stream with curl.

你现在可以使用你下载的文件和wget。或者你可以用curl来处理流。


Resources :

资源:

#4


2  

content=`wget -O - $url`

#5


2  

You can use curl or wget to retrieve the raw data, or you can use w3m -dump to have a nice text representation of a web page.

您可以使用curl或wget来检索原始数据,或者您可以使用w3m -dump来获得一个web页面的漂亮的文本表示。

$ foo=$(w3m -dump http://www.example.com/); echo $foo
You have reached this web page by typing "example.com", "example.net","example.org" or "example.edu" into your web browser. These domain names are reserved for use in documentation and are not available for registration. See RFC 2606, Section 3.

#6


2  

If you have LWP installed, it provides a binary simply named "GET".

如果您安装了LWP,它提供了一个简单命名为“GET”的二进制文件。

$ GET http://example.com
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HTML>
<HEAD>
  <META http-equiv="Content-Type" content="text/html; charset=utf-8">
  <TITLE>Example Web Page</TITLE>
</HEAD> 
<body>  
<p>You have reached this web page by typing &quot;example.com&quot;,
&quot;example.net&quot;,&quot;example.org&quot
  or &quot;example.edu&quot; into your web browser.</p>
<p>These domain names are reserved for use in documentation and are not available 
  for registration. See <a href="http://www.rfc-editor.org/rfc/rfc2606.txt">RFC 
  2606</a>, Section 3.</p>
</BODY>
</HTML>

wget -O-, curl, and lynx -source behave similarly.

wget - o -、curl和lynx -source的行为类似。

#1


138  

You can use wget command to download the page and read it into a variable as:

您可以使用wget命令下载该页面并将其读入一个变量:

content=$(wget google.com -q -O -)
echo $content

We use the -O option of wget which allows us to specify the name of the file into which wget dumps the page contents. We specify - to get the dump onto standard output and collect that into the variable content. You can add the -q quiet option to turn off's wget output.

我们使用wget的-O选项,它允许我们指定要转储页面内容的文件的名称。我们指定-将转储文件放到标准输出中,并将其收集到变量内容中。您可以添加-q安静选项来关闭wget输出。

You can use the curl command for this aswell as:

您可以使用curl命令来处理这个问题:

content=$(curl -L google.com)
echo $content

We need to use the -L option as the page we are requesting might have moved. In which case we need to get the page from the new location. The -L or --location option helps us with this.

我们需要使用-L选项,因为我们请求的页面可能已经移动了。在这种情况下,我们需要从新的位置获取页面。-L或-location选项帮助我们解决这个问题。

#2


20  

there is many way to get a page in command line... but it also depends if you want the code source or the page itself:

在命令行中有很多方法可以获得一个页面……但它也取决于你是否想要源代码或页面本身:

If you need the code source

如果您需要源代码。

with curl: curl $url

curl:curl $ url

with wget: wget -O - $url

wget: wget - o - $url。

but if you want to get what you can see with a browser, lynx can be usefull: lynx -dump $url

但是如果你想要用浏览器看到的东西,lynx可以使用:lynx -dump $url。

I think you can find so many solutions for this little problem, maybe you should read all man page for those commands. And don't forget to replace $url by your url :)

我想你能找到这么多解决这个小问题的方法,也许你应该读一下所有的手册。别忘了用url替换$url:

Good luck :)

祝你好运:)

#3


9  

There is the wget command or the curl.

有wget命令或curl。

You can now use the file you downloaded with wget. Or you can handle a stream with curl.

你现在可以使用你下载的文件和wget。或者你可以用curl来处理流。


Resources :

资源:

#4


2  

content=`wget -O - $url`

#5


2  

You can use curl or wget to retrieve the raw data, or you can use w3m -dump to have a nice text representation of a web page.

您可以使用curl或wget来检索原始数据,或者您可以使用w3m -dump来获得一个web页面的漂亮的文本表示。

$ foo=$(w3m -dump http://www.example.com/); echo $foo
You have reached this web page by typing "example.com", "example.net","example.org" or "example.edu" into your web browser. These domain names are reserved for use in documentation and are not available for registration. See RFC 2606, Section 3.

#6


2  

If you have LWP installed, it provides a binary simply named "GET".

如果您安装了LWP,它提供了一个简单命名为“GET”的二进制文件。

$ GET http://example.com
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HTML>
<HEAD>
  <META http-equiv="Content-Type" content="text/html; charset=utf-8">
  <TITLE>Example Web Page</TITLE>
</HEAD> 
<body>  
<p>You have reached this web page by typing &quot;example.com&quot;,
&quot;example.net&quot;,&quot;example.org&quot
  or &quot;example.edu&quot; into your web browser.</p>
<p>These domain names are reserved for use in documentation and are not available 
  for registration. See <a href="http://www.rfc-editor.org/rfc/rfc2606.txt">RFC 
  2606</a>, Section 3.</p>
</BODY>
</HTML>

wget -O-, curl, and lynx -source behave similarly.

wget - o -、curl和lynx -source的行为类似。