正则表达式字符类中的空格产生奇怪的结果

时间:2023-02-12 20:14:00

So I was working on some regex and came across some weird behavior in regex.

所以我正在研究一些正则表达式,并在正则表达式中遇到了一些奇怪的行为。

I had a character class in the regex that included a bunch of characters (alphanumeric) and ended with a space, a dash, and a plus. The weird behavior is reproducable using the following regex.

我在正则表达式中有一个字符类,其中包含一堆字符(字母数字),并以空格,短划线和加号结束。使用以下正则表达式可以重现奇怪的行为。

^[ -+]*$

So what happens is that a space is valid text input and so is the plus. However, for some reason the dash is not valid text input. The regex can be fixed by rearranging the charaters in the class as so:

所以会发生的是空间是有效的文本输入,因此是正的。但是,由于某种原因,破折号不是有效的文本输入。可以通过重新排列类中的字符来修复正则表达式,如下所示:

^[ +-]*$

Now all the characters are valid input. This has been reproduced in Chrome using jsFiddle and also using Expresso.

现在所有字符都是有效输入。这已在Chrome中使用jsFiddle和Expresso重现。

My question is basically, am I doing something wrong or is this just weird? :)

我的问题基本上是,我做错了什么或者这只是奇怪吗? :)

2 个解决方案

#1


6  

The - character has special meaning inside character classes. When it appears between two characters, it creates a range, e.g. [0-9] matches any character between 0 and 9, inclusive. However, when placed at the start or the end of the character class (or when escaped) it represents a literal - character.

- 字符在字符类中具有特殊含义。当它出现在两个字符之间时,它会创建一个范围,例如[0-9]匹配0到9之间的任何字符,包括0和9。但是,当放置在角色类的开头或结尾时(或者在转义时),它代表一个文字字符。

  • [ -+] will match any character between a space (char code 32) and a + (char code 43), inclusive.
  • [ - +]将匹配空格(字符代码32)和+(字符代码43)之间的任何字符。

  • [ +-] will match a space (char code 32), a + (char code 43), or a - (char code 45)
  • [+ - ]将匹配空格(字符代码32),+(字符代码43)或 - (字符代码45)

#2


3  

Because in first you were treating - as "to" or range operator as in a-z

因为首先你要处理 - 像a-z中的“to”或范围运算符

So it is becoming space to + which is a range. Either escape - by prepending a \ or put it at first or at last.

所以它变成了空间到+这是一个范围。要么逃避 - 通过预先设置\或者将它放在第一个或最后一个。

#1


6  

The - character has special meaning inside character classes. When it appears between two characters, it creates a range, e.g. [0-9] matches any character between 0 and 9, inclusive. However, when placed at the start or the end of the character class (or when escaped) it represents a literal - character.

- 字符在字符类中具有特殊含义。当它出现在两个字符之间时,它会创建一个范围,例如[0-9]匹配0到9之间的任何字符,包括0和9。但是,当放置在角色类的开头或结尾时(或者在转义时),它代表一个文字字符。

  • [ -+] will match any character between a space (char code 32) and a + (char code 43), inclusive.
  • [ - +]将匹配空格(字符代码32)和+(字符代码43)之间的任何字符。

  • [ +-] will match a space (char code 32), a + (char code 43), or a - (char code 45)
  • [+ - ]将匹配空格(字符代码32),+(字符代码43)或 - (字符代码45)

#2


3  

Because in first you were treating - as "to" or range operator as in a-z

因为首先你要处理 - 像a-z中的“to”或范围运算符

So it is becoming space to + which is a range. Either escape - by prepending a \ or put it at first or at last.

所以它变成了空间到+这是一个范围。要么逃避 - 通过预先设置\或者将它放在第一个或最后一个。