匹配url中的特定regex词

时间:2022-09-13 16:28:31

I must admit I've never gotten used to using regex, however recently I ran into a problem where the work around would've been more of a pain than using regex. I need to be able to match anything that follows the following pattern at the beginning of a string: {any_url_safe_word} +( "/http://" || "/https://" || "www.") + {any word}. So the following should match:

我必须承认,我从来都不习惯使用regex,但是最近我遇到了一个问题,与使用regex相比,周围的工作更麻烦。我需要能够匹配字符串开头遵循以下模式的任何内容:{any_url_safe_word} +("/http://" || "/https://" || "www.") +{任何单词}。因此,以下内容应该匹配:

  • cars/http://google.com#test
  • 汽车/ http://google.com测试
  • cars/https://google.com#test
  • 汽车/ https://google.com测试
  • cars/www.google.com#test
  • 汽车/ www.google.com测试

The follwing shouldn't match:

方向不匹配:

  • cars/httdp://google.com#test
  • 汽车/ httdp:/ / google.com测试
  • cars/http:/google.com#test
  • 汽车/ http:/ google.com测试

What I tried so far is: ^[\w]{1,500}\/[(http\:\/\/)|(https:\/\/])|([www\.])]{0,50}, but that matches cars/http from cars/httpd://google.com.

我试着到目前为止是什么:^[\ w]{ 1500 } \ /((http:\ \ / \ /)|(https:\ / \])|([www \])){ 0,50 },但相匹配的汽车从汽车/ httpd / http:/ / google.com。

3 个解决方案

#1


3  

This regex could do:

这个正则表达式可以做:

^[\w\d]+\/(?:https?:\/\/)?(?:www\.)?[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}

And if you want to get everything that comes after it, you can just add (.*) to the end...

如果你想要得到它之后的所有东西,你可以在结尾加上(.*)……

Live DEMO

现场演示

匹配url中的特定regex词

And since it seems that the more or less general list of URL safe words contains ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._~:/?#[]@!$&'()*+,;= Source, you may include that too, so you'll get (after simplification):

而且,由于URL安全词的一般列表中似乎包含abcdefghijklmnopqrstuvxyzabcdefjklmnfghijklmnopnopnopnopqrstuvnopqrstuvrstuvnxyzwxyzwxyz0123456789 -._ -._:/?

^[!#$&-.0-;=?-\[\]_a-z~]+\/(?:https?:\/\/)?(?:www\.)?[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}

#2


0  

Check out the demo.

查看演示。

[a-z0-9-_.~]+/(https?://|www\.)[a-z0-9]+\.[a-z]{2,6}([/?#a-z0-9-_.~])*

[a-z0-9 _。~)+ /(https:/ / | www \)。[a-z0-9]+ \[a - z]{ 2,6 }([/ ? # a-z0-9 _ ~]。)*

Edit: taken @CD001 comment into account. Be sure to use the i modifier if you don't mind case-sensitivity.

编辑:使用@CD001评论。如果你不介意区分大小写,请务必使用我的修改器。

#3


0  

<?php
$words = array(
    'cars/http://google.com#test',
    'cars/https://google.com#test',
    'cars/www.google.com#test',
    'cars/httdp://google.com#test',
    'cars/http:/google.com#test',
    'c a r s/http:/google.com#test'
    );

foreach($words as $value)
{
    /*
      \S+           - at least one non-space symbol
      \/            - slash
      (https?:\/\/) - http with possible s then ://
      |             - or
      (www\.)       - www.
      .+            - at least one symbol
     */
    if (preg_match('/^\S+\/(https?:\/\/)|(www\.).+/', $value))
    {
        print $value. " good\n";
    }
    else
    {
        print $value. " bad\n";
    }
}

Prints:

打印:

cars/http://google.com#test good
cars/https://google.com#test good
cars/www.google.com#test good
cars/httdp://google.com#test bad
cars/http:/google.com#test bad
c a r s/http:/google.com#test bad

#1


3  

This regex could do:

这个正则表达式可以做:

^[\w\d]+\/(?:https?:\/\/)?(?:www\.)?[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}

And if you want to get everything that comes after it, you can just add (.*) to the end...

如果你想要得到它之后的所有东西,你可以在结尾加上(.*)……

Live DEMO

现场演示

匹配url中的特定regex词

And since it seems that the more or less general list of URL safe words contains ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._~:/?#[]@!$&'()*+,;= Source, you may include that too, so you'll get (after simplification):

而且,由于URL安全词的一般列表中似乎包含abcdefghijklmnopqrstuvxyzabcdefjklmnfghijklmnopnopnopnopqrstuvnopqrstuvrstuvnxyzwxyzwxyz0123456789 -._ -._:/?

^[!#$&-.0-;=?-\[\]_a-z~]+\/(?:https?:\/\/)?(?:www\.)?[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}

#2


0  

Check out the demo.

查看演示。

[a-z0-9-_.~]+/(https?://|www\.)[a-z0-9]+\.[a-z]{2,6}([/?#a-z0-9-_.~])*

[a-z0-9 _。~)+ /(https:/ / | www \)。[a-z0-9]+ \[a - z]{ 2,6 }([/ ? # a-z0-9 _ ~]。)*

Edit: taken @CD001 comment into account. Be sure to use the i modifier if you don't mind case-sensitivity.

编辑:使用@CD001评论。如果你不介意区分大小写,请务必使用我的修改器。

#3


0  

<?php
$words = array(
    'cars/http://google.com#test',
    'cars/https://google.com#test',
    'cars/www.google.com#test',
    'cars/httdp://google.com#test',
    'cars/http:/google.com#test',
    'c a r s/http:/google.com#test'
    );

foreach($words as $value)
{
    /*
      \S+           - at least one non-space symbol
      \/            - slash
      (https?:\/\/) - http with possible s then ://
      |             - or
      (www\.)       - www.
      .+            - at least one symbol
     */
    if (preg_match('/^\S+\/(https?:\/\/)|(www\.).+/', $value))
    {
        print $value. " good\n";
    }
    else
    {
        print $value. " bad\n";
    }
}

Prints:

打印:

cars/http://google.com#test good
cars/https://google.com#test good
cars/www.google.com#test good
cars/httdp://google.com#test bad
cars/http:/google.com#test bad
c a r s/http:/google.com#test bad