在正则表达式中使用冒号符号

时间:2022-07-11 20:12:37

I am new to regex .I am studying it in regularexperssion.com ..The question is that i need to know what is the use of colon (:) in regular expressions ..

我是regex的新手,我在regularexperssion.com学习它。问题是我需要知道在正则表达式中冒号(:)的用法。

For example ..:

例如. .:

$pattern = '/^(([\w]+:)?\/\/)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)?@)?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,4}(:[\d]+)?(\/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?$/';

which matches

匹配

$url1  = "http://www.somewebsite.com";
$url2  = "https://www.somewebsite.com";
$url3  = "https://somewebsite.com";
$url4  = "www.somewebsite.com";
$url5  = "somewebsite.com";

Yeah any help would be greately appreciated ..:)

是的,任何帮助都将得到感激。

4 个解决方案

#1


29  

Colon : is simply colon. It means nothing, except special cases like, for example, clustering without capturing (also known as a non-capturing group):

冒号:就是冒号。它没有任何意义,除了特殊的情况,例如,不捕获的集群(也称为非捕获组):

(?:pattern)

Also it can be used in character classes, for example:

也可以用于字符类,例如:

[[:upper:]]

However, in your case colon is just a colon.

但是,在你的例子中冒号只是一个冒号。

Special characters used in your regex:

您的regex中使用的特殊字符:

In character class [-+_~.\d\w]:

在字符类[- d + _ ~。\ \ w]:

  • - means -
  • ——意味着
  • + means +
  • +是+
  • _ means _
  • _是_
  • ~ means ~
  • ~是~
  • . means .
  • 。的意思。
  • \d means any digit
  • \ d意味着任何数字
  • \w means any word character
  • 什么字都可以

These symbols have this meaning because they are used in a symbol class []. Without symbol class + and . have special meaning.

这些符号有这个意思,因为它们在符号类[]中使用。没有符号类+和。有特殊的意义。

Other elements:

其他要素:

  • =? means = that can occur 0 or 1 times; in other words = that can occur or not, optional =.
  • = ?表示可以发生0或1次;换句话说=可以发生也可以不发生,可选=。

#2


16  

I've decided to go you one better and explain the entire regex:

我决定给你一个更好的解释整个regex:

^                 # anchor to start of line
(                 # start grouping
 (                # start grouping
  [\w]+           # at least one of 0-9a-zA-Z_
  :               # a literal colon
 )                # end grouping
 ?                # this grouping is optional
 \/\/             # two literal slashes
)                 # end capture
?                 # this grouping is optional
(
 (
  [\d\w]          # exactly one of 0-9a-zA-Z_
                  # having \d is redundant
  |               # alternation
  %               # literal % sign
  [a-fA-f\d]{2,2} # exactly 2 hexadecimal digits
                  # should probably be A-F
                  # using {2} would have sufficed
 )+               # at least one of this groups
 (                # start grouping
  :               # literal colon
  (
   [\d\w]
   |
   %
   [a-fA-f\d]{2,2}
  )+
 )?               # Same grouping, but it is optional
                  # and there can be only one
 @                # literal @ sign
)?                # this group is optional
(
 [\d\w]           # same as [\w], explained above
 [-\d\w]{0,253}   # includes a dash as a valid character
                  # between 0 and 253 of these characters
 [\d\w]           # end with \w.  They want at most 255
                  # total and - cannot be at the start
                  # or end
 \.               # literal period
)+                # at least one of these groups
[\w]{2,4}         # two to four \w characters
(
 :                # literal colon
 [\d]+            # at least one digit
)?
(
 \/               # literal slash
 (
  [-+_~.\d\w]    # one of these characters
  |              # *or*
  %              # % with two hex digit combo
  [a-fA-f\d]{2,2}
 )*              # zero or more of these groups
)*               # zero or more of these groups
(
 \?              # literal question mark
 (
  &?         # literal &amp or &
  (
   [-+_~.\d\w]
   |
   %
   [a-fA-f\d]{2,2}
  )
  =?             # optional literal =
 )*              # zero or more of this group
)?               # this group is optional
(
 #               # literal #
 (
  [-+_~.\d\w]
  |
  %
  [a-fA-f\d]{2,2}
 )*
)?
$                # anchor to end of line

It's important to understand what the metacharacters/sequences are. Some sequences are not meta when used in certain contexts (especially a character class). I've cataloged them for you:

理解元字符/序列是什么很重要。在某些上下文中(特别是字符类)使用时,有些序列不是元数据。我已经给你编目了:

meta with no context

  • ^ -- zero width start of line
  • ^——零宽度的线
  • () -- grouping/capture
  • ()——分组/捕获
  • ? -- zero or one of the preceding sequence
  • 吗?——0或前一个序列中的一个
  • + -- one or more of the preceding sequence
  • +——前一个或多个序列
  • * -- zero or more of the preceding sequence
  • *——前一个序列的0或更多
  • [] -- character class
  • []——字符类
  • \w -- alphanumeric characters and _. Opposite of \W
  • \w—字母数字字符和_。相反\ W
  • | -- alternation
  • |——交替
  • {} -- length assertion
  • { }——长度断言
  • $ -- zero width end of line
  • $——线的零宽端

This excludes :, @, and % from having any special/meta meaning in the raw context.

这个不包括:,@,和%在原始环境中有任何特殊/元意义。

meta inside character class

] ends the character class. - creates a range of characters unless it is at the start or the end of the character class.

结束人物类。-创建一系列字符,除非是在字符类的开始或结尾。

grouping assertions

A (? combination starts a grouping assertion. For example, (?: means group but do not capture. This means that in the regex /(?:a)/, it will match the string "a", but a is not captured for use in replacement or match groups as it would be from /(a)/.

(?组合启动一个分组断言。例如,(?:表示组,但不表示捕获。这意味着,在regex /(?:a)/中,它将匹配字符串“a”,但a不会被捕获以用于替换或匹配组,因为它将来自/(a)/。

? can also be used for lookahead/lookbehind assertions with ?=, ?!, ?<=, ?<!. (? followed by any sequence except what I mentioned in this section is just a literal ?.

吗?还可以用于使用?=、?!的lookahead/lookbehind断言。,< =,< !(?除了我在这一节中提到的,其他任何序列都只是字面意思?

#3


5  

There is no special use for colon : in your case :

冒号没有特殊用途:在你的情况下:

(([\w]+:)?\/\/)? will match http://, https://, ftp://...

(((\ w)+:)? \ / \ /)?将匹配http://、https://、ftp://…。

You can find one special use for colon : every capturing group starting by (?: won't appear in the results.
Example, with "foobarbaz" in input :

您可以找到冒号的一个特殊用途:从(?不会出现在结果中。例如,在输入中使用“foobarbaz”:

  • /foo((bar)(baz))/ => { [1] => 'barbaz', [2] => 'bar', [3] => 'baz' }
  • / foo((bar)(baz))/ = > {[1]= >“barbaz”,[2]= >“酒吧”,[3]= > ' baz }
  • /foo(?:(bar)(baz))/ => { [1] => 'bar', [2] => 'baz' }
  • / foo(?:(bar)(baz))/ = > {[1]= >“酒吧”,[2]= > ' baz }

#4


0  

A colon has no special meaning in Regular Expressions, it just matches a literal colon.

在正则表达式中,冒号没有特殊的含义,它只是匹配字面冒号。

[\w]+:

This just means any word character 1 or more times followed by a literal colon The brackets are actually not needed here. Square brackets are used to define a group of characters to match. So

这意味着任何单词字符1或更多的时间后跟一个字面的冒号,括号在这里实际上是不需要的。方括号用于定义一组要匹配的字符。所以

[abcd]

means a single character of a, b, c, d

表示a、b、c、d的单个字符

#1


29  

Colon : is simply colon. It means nothing, except special cases like, for example, clustering without capturing (also known as a non-capturing group):

冒号:就是冒号。它没有任何意义,除了特殊的情况,例如,不捕获的集群(也称为非捕获组):

(?:pattern)

Also it can be used in character classes, for example:

也可以用于字符类,例如:

[[:upper:]]

However, in your case colon is just a colon.

但是,在你的例子中冒号只是一个冒号。

Special characters used in your regex:

您的regex中使用的特殊字符:

In character class [-+_~.\d\w]:

在字符类[- d + _ ~。\ \ w]:

  • - means -
  • ——意味着
  • + means +
  • +是+
  • _ means _
  • _是_
  • ~ means ~
  • ~是~
  • . means .
  • 。的意思。
  • \d means any digit
  • \ d意味着任何数字
  • \w means any word character
  • 什么字都可以

These symbols have this meaning because they are used in a symbol class []. Without symbol class + and . have special meaning.

这些符号有这个意思,因为它们在符号类[]中使用。没有符号类+和。有特殊的意义。

Other elements:

其他要素:

  • =? means = that can occur 0 or 1 times; in other words = that can occur or not, optional =.
  • = ?表示可以发生0或1次;换句话说=可以发生也可以不发生,可选=。

#2


16  

I've decided to go you one better and explain the entire regex:

我决定给你一个更好的解释整个regex:

^                 # anchor to start of line
(                 # start grouping
 (                # start grouping
  [\w]+           # at least one of 0-9a-zA-Z_
  :               # a literal colon
 )                # end grouping
 ?                # this grouping is optional
 \/\/             # two literal slashes
)                 # end capture
?                 # this grouping is optional
(
 (
  [\d\w]          # exactly one of 0-9a-zA-Z_
                  # having \d is redundant
  |               # alternation
  %               # literal % sign
  [a-fA-f\d]{2,2} # exactly 2 hexadecimal digits
                  # should probably be A-F
                  # using {2} would have sufficed
 )+               # at least one of this groups
 (                # start grouping
  :               # literal colon
  (
   [\d\w]
   |
   %
   [a-fA-f\d]{2,2}
  )+
 )?               # Same grouping, but it is optional
                  # and there can be only one
 @                # literal @ sign
)?                # this group is optional
(
 [\d\w]           # same as [\w], explained above
 [-\d\w]{0,253}   # includes a dash as a valid character
                  # between 0 and 253 of these characters
 [\d\w]           # end with \w.  They want at most 255
                  # total and - cannot be at the start
                  # or end
 \.               # literal period
)+                # at least one of these groups
[\w]{2,4}         # two to four \w characters
(
 :                # literal colon
 [\d]+            # at least one digit
)?
(
 \/               # literal slash
 (
  [-+_~.\d\w]    # one of these characters
  |              # *or*
  %              # % with two hex digit combo
  [a-fA-f\d]{2,2}
 )*              # zero or more of these groups
)*               # zero or more of these groups
(
 \?              # literal question mark
 (
  &amp;?         # literal &amp or &amp;
  (
   [-+_~.\d\w]
   |
   %
   [a-fA-f\d]{2,2}
  )
  =?             # optional literal =
 )*              # zero or more of this group
)?               # this group is optional
(
 #               # literal #
 (
  [-+_~.\d\w]
  |
  %
  [a-fA-f\d]{2,2}
 )*
)?
$                # anchor to end of line

It's important to understand what the metacharacters/sequences are. Some sequences are not meta when used in certain contexts (especially a character class). I've cataloged them for you:

理解元字符/序列是什么很重要。在某些上下文中(特别是字符类)使用时,有些序列不是元数据。我已经给你编目了:

meta with no context

  • ^ -- zero width start of line
  • ^——零宽度的线
  • () -- grouping/capture
  • ()——分组/捕获
  • ? -- zero or one of the preceding sequence
  • 吗?——0或前一个序列中的一个
  • + -- one or more of the preceding sequence
  • +——前一个或多个序列
  • * -- zero or more of the preceding sequence
  • *——前一个序列的0或更多
  • [] -- character class
  • []——字符类
  • \w -- alphanumeric characters and _. Opposite of \W
  • \w—字母数字字符和_。相反\ W
  • | -- alternation
  • |——交替
  • {} -- length assertion
  • { }——长度断言
  • $ -- zero width end of line
  • $——线的零宽端

This excludes :, @, and % from having any special/meta meaning in the raw context.

这个不包括:,@,和%在原始环境中有任何特殊/元意义。

meta inside character class

] ends the character class. - creates a range of characters unless it is at the start or the end of the character class.

结束人物类。-创建一系列字符,除非是在字符类的开始或结尾。

grouping assertions

A (? combination starts a grouping assertion. For example, (?: means group but do not capture. This means that in the regex /(?:a)/, it will match the string "a", but a is not captured for use in replacement or match groups as it would be from /(a)/.

(?组合启动一个分组断言。例如,(?:表示组,但不表示捕获。这意味着,在regex /(?:a)/中,它将匹配字符串“a”,但a不会被捕获以用于替换或匹配组,因为它将来自/(a)/。

? can also be used for lookahead/lookbehind assertions with ?=, ?!, ?<=, ?<!. (? followed by any sequence except what I mentioned in this section is just a literal ?.

吗?还可以用于使用?=、?!的lookahead/lookbehind断言。,< =,< !(?除了我在这一节中提到的,其他任何序列都只是字面意思?

#3


5  

There is no special use for colon : in your case :

冒号没有特殊用途:在你的情况下:

(([\w]+:)?\/\/)? will match http://, https://, ftp://...

(((\ w)+:)? \ / \ /)?将匹配http://、https://、ftp://…。

You can find one special use for colon : every capturing group starting by (?: won't appear in the results.
Example, with "foobarbaz" in input :

您可以找到冒号的一个特殊用途:从(?不会出现在结果中。例如,在输入中使用“foobarbaz”:

  • /foo((bar)(baz))/ => { [1] => 'barbaz', [2] => 'bar', [3] => 'baz' }
  • / foo((bar)(baz))/ = > {[1]= >“barbaz”,[2]= >“酒吧”,[3]= > ' baz }
  • /foo(?:(bar)(baz))/ => { [1] => 'bar', [2] => 'baz' }
  • / foo(?:(bar)(baz))/ = > {[1]= >“酒吧”,[2]= > ' baz }

#4


0  

A colon has no special meaning in Regular Expressions, it just matches a literal colon.

在正则表达式中,冒号没有特殊的含义,它只是匹配字面冒号。

[\w]+:

This just means any word character 1 or more times followed by a literal colon The brackets are actually not needed here. Square brackets are used to define a group of characters to match. So

这意味着任何单词字符1或更多的时间后跟一个字面的冒号,括号在这里实际上是不需要的。方括号用于定义一组要匹配的字符。所以

[abcd]

means a single character of a, b, c, d

表示a、b、c、d的单个字符