使用bash从锚标记中获取href值

时间:2022-01-14 09:53:46

This is the html I'm parsing

这是我正在解析的HTML

<li id="dl_linux_32">
   <a href="link">Link</a>
</li>
<li id="dl_linux_64">
   <a href="another_link">Another Link</a>
</li>

with this curl URL 2>&1 | grep -oE 'href="([^"#]+)"' | sed "s/ /%20/g" | cut -f2 -d "=" I'm able to get all href's values. However I just want the href's value of the anchor inside the li with id equals to dl_linux_32.

这个卷曲URL 2>&1 | grep -oE'href =“([^”#] +)“'| sed”s / /%20 / g“| cut -f2 -d”=“我能够得到所有href的值。但是我只是想要li里面的href的值,id等于dl_linux_32。

Can someone help me finish the regex?

有人可以帮我完成正则表达式吗?

4 个解决方案

#1


1  

Perl One-Liner

The regex must check across multiple lines. In this sort of situation, a Perl one-liner will work beautifully.

正则表达式必须检查多行。在这种情况下,Perl one-liner可以很好地工作。

perl -0777 -ne 'print "$&\n" if /<li id="dl_linux_32">\s*<a \Khref="[^"]+"/' yourfile

#2


0  

Through GNU awk,

通过GNU awk,

$ awk -F'"' -v RS="</li>" '/<li\s*id=\"dl_linux_32\">/{print $4}' file
link

#3


0  

The regex I was looking for is dl_linux_32.+href="([^"#]+)". I'm searching for all href's values that before it has one or more characters and dl_linux_32

我正在寻找的正则表达式是dl_linux_32。+ href =“([^”#] +)“。我正在搜索所有href的值,它之前有一个或多个字符和dl_linux_32

#4


0  

IF the html is valid XML, you can use a tool that incorporates xpath searching

如果html是有效的XML,您可以使用包含xpath搜索的工具

echo '<html>
      <li id="dl_linux_32">
         <a href="link">Link</a>
      </li>
      <li id="dl_linux_64">
         <a href="another_link">Another Link</a>
      </li>
      </html>
' | xmlstarlet sel -t -v '//li[@id="dl_linux_32"]/a/@href'
link

#1


1  

Perl One-Liner

The regex must check across multiple lines. In this sort of situation, a Perl one-liner will work beautifully.

正则表达式必须检查多行。在这种情况下,Perl one-liner可以很好地工作。

perl -0777 -ne 'print "$&\n" if /<li id="dl_linux_32">\s*<a \Khref="[^"]+"/' yourfile

#2


0  

Through GNU awk,

通过GNU awk,

$ awk -F'"' -v RS="</li>" '/<li\s*id=\"dl_linux_32\">/{print $4}' file
link

#3


0  

The regex I was looking for is dl_linux_32.+href="([^"#]+)". I'm searching for all href's values that before it has one or more characters and dl_linux_32

我正在寻找的正则表达式是dl_linux_32。+ href =“([^”#] +)“。我正在搜索所有href的值,它之前有一个或多个字符和dl_linux_32

#4


0  

IF the html is valid XML, you can use a tool that incorporates xpath searching

如果html是有效的XML,您可以使用包含xpath搜索的工具

echo '<html>
      <li id="dl_linux_32">
         <a href="link">Link</a>
      </li>
      <li id="dl_linux_64">
         <a href="another_link">Another Link</a>
      </li>
      </html>
' | xmlstarlet sel -t -v '//li[@id="dl_linux_32"]/a/@href'
link