如何使用bash shell脚本从文本文件中检查URL的状态

时间:2022-05-04 23:45:13

I have to check the status of 200 http URLs and find out which of these are broken links. The links are present in a simple text file (say URL.txt present in my ~ folder). I am using Ubuntu 14.04 and I am a Linux newbie. But I understand the bash shell is very powerful and could help me achieve what I want.

我必须检查200个http URL的状态,并找出其中哪些是断开的链接。链接存在于一个简单的文本文件中(比如我的〜文件夹中的URL.txt)。我正在使用Ubuntu 14.04,我是一个Linux新手。但我知道bash shell非常强大,可以帮助我实现我想要的。

My exact requirement would be to read the text file which has the list of URLs and automatically check if the links are working and write the response to a new file with the URLs and their corresponding status (working/broken).

我的确切要求是读取包含URL列表的文本文件,并自动检查链接是否正常工作,并将响应写入包含URL及其相应状态(工作/损坏)的新文件。

4 个解决方案

#1


12  

I created a file "checkurls.sh" and placed it in my home directory where the urls.txt file is also located. I gave execute privileges to the file using

我创建了一个文件“checkurls.sh”并将其放在我的主目录中,其中urls.txt文件也位于该目录中。我使用了给文件的执行权限

$chmod +x checkurls.sh

$ chmod + x checkurls.sh

The contents of checkurls.sh is given below:

checkurls.sh的内容如下:

#!/bin/bash
while read url
do
    urlstatus=$(curl -o /dev/null --silent --head --write-out '%{http_code}' "$url" )
    echo "$url  $urlstatus" >> urlstatus.txt
done < $1

Finally, I executed it from command line using the following -

最后,我使用以下命令从命令行执行 -

$./checkurls.sh urls.txt

Voila! It works.

瞧!有用。

#2


4  

#!/bin/bash
while read -ru 4 LINE; do
    read -r REP < <(exec curl -IsS "$LINE" 2>&1)
    echo "$LINE: $REP"
done 4< "$1"

Usage:

bash script.sh urls-list.txt

Sample:

http://not-exist.com/abc.html
https://kernel.org/nothing.html
http://kernel.org/index.html
https://kernel.org/index.html

Output:

http://not-exist.com/abc.html: curl: (6) Couldn't resolve host 'not-exist.com'
https://kernel.org/nothing.html: HTTP/1.1 404 Not Found
http://kernel.org/index.html: HTTP/1.1 301 Moved Permanently
https://kernel.org/index.html: HTTP/1.1 200 OK

For everything, read the Bash Manual. See man curl, help, man bash as well.

对于一切,请阅读Bash手册。看男人卷曲,帮助,男人猛击。

#3


1  

What about to add some parallelism to the accepted solution. Lets modify the script chkurl.sh to be little easier to read and to handle just one request at a time:

如何为已接受的解决方案添加一些并行性。让我们修改脚本chkurl.sh,以便更容易阅读并一次只处理一个请求:

#!/bin/bash
URL=${1?Pass URL as parameter!}
curl -o /dev/null --silent --head --write-out "$URL %{http_code} %{redirect_url}\n" "$URL"

And now you check your list using:

现在,您使用以下方法检查列表:

cat URL.txt | xargs -P 4 -L1 ./chkurl.sh

This could finish the job up to 4 times faster.

这可以使作业完成速度提高4倍。

#4


0  

if your input file contains one url per line you can use a script to read each line, then try to ping the url, if ping success then the url is valid

如果你的输入文件每行包含一个url,你可以使用脚本来读取每一行,然后尝试ping网址,如果ping成功则url有效

#!/bin/bash
INPUT="Urls.txt"
OUTPUT="result.txt"
while read line ;
do
  if ping -c 1 $line &> /dev/null
  then
      echo "$line valid" >> $OUTPUT
  else
      echo "$line not valid " >> $OUTPUT
  fi
done < $INPUT
exit

ping options :

ping选项:

-c count
      Stop after sending count ECHO_REQUEST packets. With deadline option, ping waits for count ECHO_REPLY packets, until the timeout expires.

you can use this option as well to limit waiting time

您也可以使用此选项来限制等待时间

 -W timeout
      Time to wait for a response, in seconds. The option affects only timeout in absense
      of any responses, otherwise ping waits for two RTTs.

#1


12  

I created a file "checkurls.sh" and placed it in my home directory where the urls.txt file is also located. I gave execute privileges to the file using

我创建了一个文件“checkurls.sh”并将其放在我的主目录中,其中urls.txt文件也位于该目录中。我使用了给文件的执行权限

$chmod +x checkurls.sh

$ chmod + x checkurls.sh

The contents of checkurls.sh is given below:

checkurls.sh的内容如下:

#!/bin/bash
while read url
do
    urlstatus=$(curl -o /dev/null --silent --head --write-out '%{http_code}' "$url" )
    echo "$url  $urlstatus" >> urlstatus.txt
done < $1

Finally, I executed it from command line using the following -

最后,我使用以下命令从命令行执行 -

$./checkurls.sh urls.txt

Voila! It works.

瞧!有用。

#2


4  

#!/bin/bash
while read -ru 4 LINE; do
    read -r REP < <(exec curl -IsS "$LINE" 2>&1)
    echo "$LINE: $REP"
done 4< "$1"

Usage:

bash script.sh urls-list.txt

Sample:

http://not-exist.com/abc.html
https://kernel.org/nothing.html
http://kernel.org/index.html
https://kernel.org/index.html

Output:

http://not-exist.com/abc.html: curl: (6) Couldn't resolve host 'not-exist.com'
https://kernel.org/nothing.html: HTTP/1.1 404 Not Found
http://kernel.org/index.html: HTTP/1.1 301 Moved Permanently
https://kernel.org/index.html: HTTP/1.1 200 OK

For everything, read the Bash Manual. See man curl, help, man bash as well.

对于一切,请阅读Bash手册。看男人卷曲,帮助,男人猛击。

#3


1  

What about to add some parallelism to the accepted solution. Lets modify the script chkurl.sh to be little easier to read and to handle just one request at a time:

如何为已接受的解决方案添加一些并行性。让我们修改脚本chkurl.sh,以便更容易阅读并一次只处理一个请求:

#!/bin/bash
URL=${1?Pass URL as parameter!}
curl -o /dev/null --silent --head --write-out "$URL %{http_code} %{redirect_url}\n" "$URL"

And now you check your list using:

现在,您使用以下方法检查列表:

cat URL.txt | xargs -P 4 -L1 ./chkurl.sh

This could finish the job up to 4 times faster.

这可以使作业完成速度提高4倍。

#4


0  

if your input file contains one url per line you can use a script to read each line, then try to ping the url, if ping success then the url is valid

如果你的输入文件每行包含一个url,你可以使用脚本来读取每一行,然后尝试ping网址,如果ping成功则url有效

#!/bin/bash
INPUT="Urls.txt"
OUTPUT="result.txt"
while read line ;
do
  if ping -c 1 $line &> /dev/null
  then
      echo "$line valid" >> $OUTPUT
  else
      echo "$line not valid " >> $OUTPUT
  fi
done < $INPUT
exit

ping options :

ping选项:

-c count
      Stop after sending count ECHO_REQUEST packets. With deadline option, ping waits for count ECHO_REPLY packets, until the timeout expires.

you can use this option as well to limit waiting time

您也可以使用此选项来限制等待时间

 -W timeout
      Time to wait for a response, in seconds. The option affects only timeout in absense
      of any responses, otherwise ping waits for two RTTs.