如何使用正则表达式匹配字符或其他内容?

时间:2022-11-11 15:47:50

I am trying to take a block of numbers that may, or may not, have dividers and return them in a standard format. Using SSN as an example:

我正在尝试取一组可能有或没有分隔符的数字,并以标准格式返回它们。以SSN为例:

ex1="An example 123-45-6789"
ex2="123.45.6789 some more things"
ex3="123456789 thank you Ruby may I have another"

should all go into a method that returns "123-45-6789" Basically, anything(INCLUDING nothing) except a number or letter should return a SSN in a XXX-XX-XXXX format. The part that is stumping is a way to regular expressions to identify that there can be nothing.

如果所有的方法都返回“123-45-6789”,那么除了数字或字母之外的任何东西(不包括任何东西)都应该以XXX-XX-XXXX格式返回SSN。stumping是一种用于正则表达式的方法,可以识别不存在任何内容。

What I have so far in IDENTIFYING my ssn:

到目前为止,我识别ssn的方法是:

def format_ssns(string)
  string.scan(/\d{3}[^0-9a-zA-Z]{1}\d{2}[^0-9a-zA-Z]{1}\d{4}/).to_a
end

It seems to work for everything I expect EXCEPT when there is nothing. "123456789" does not work. Can I use regular expressions in this case to identify lack of anything?

它似乎对我所期望的一切都起作用,除非什么都没有。“123456789”不工作。在这种情况下,我可以使用正则表达式来确定缺少什么吗?

4 个解决方案

#1


5  

Have you tried to match 0 or 1 characters between your numbers?

你试过在你的数字之间匹配0或1个字符吗?

\d{3}[^0-9a-zA-Z]{0,1}\d{2}[^0-9a-zA-Z]{0,1}\d{4}

#2


31  

This has already been shared in a comment, but just to provide a complete-ish answer...

这已经在一条评论中被分享了,但只是为了提供一个完整的答案……

You have these tools at your disposal:

你有这些工具供你使用:

  • x matches x exactly once
  • x恰好匹配x一次
  • x{a,b} matches x between a and b times
  • x{a,b}匹配a和b次之间的x
  • x{a,} matches x at least a times
  • {a,}匹配至少一次
  • x{,b} matches x up to (a maximum of) b times
  • x{,b}匹配x至多(最多)b次
  • x* matches x zero or more times (same as x{0,})
  • x*匹配x 0或更多次(与x{0,}相同)
  • x+ matches x one or more times (same as x{1,})
  • x+匹配x 1次或多次(与x{1,}相同)
  • x? matches x zero or one time (same as x{0,1})
  • x ?匹配x 0或1次(与x{0,1}相同)

So you want to use that last one, since it's exactly what you're looking for (zero or one time).

所以你想用最后一个,因为它就是你要找的(0或1)

/\d{3}[^0-9a-zA-Z]?\d{2}[^0-9a-zA-Z]?\d{4}/

#3


2  

Your current regex will allow 123-45[6789, not to mention all kinds of Unicode characters and control characters. In the extreme case:

您当前的regex将允许123-45[6789],更不用说各种Unicode字符和控制字符了。在极端的例子:

123
45師6789

is considered a matched by your regex.

被您的regex认为是匹配的。

You can use backreference to make sure the separator is the same.

您可以使用backreference来确保分隔符是相同的。

/\d{3}([.-]?)\d{2}\1\d{4}/

[.-]? will match either ., - or nothing (due to the optional ? quantifier). Whatever matched here will be used to make sure that the second separator is the same via backreference.

(。)?将匹配任何一个,-或没有(由于可选?量词)。这里匹配的内容将用于确保第二个分隔符通过backreference是相同的。

#4


0  

Whelp... looks like I just found my own answer, but any clues for improvement would be helpful.

幼兽……看起来我找到了自己的答案,但是任何改进的线索都是有用的。

def format_ssns(string)
  string.scan(/\d{3}[^0-9a-zA-Z]{0,1}\d{2}[^0-9a-zA-Z]{1}\d{4}/).to_a
end

Seems to do the trick.

这似乎很管用。

#1


5  

Have you tried to match 0 or 1 characters between your numbers?

你试过在你的数字之间匹配0或1个字符吗?

\d{3}[^0-9a-zA-Z]{0,1}\d{2}[^0-9a-zA-Z]{0,1}\d{4}

#2


31  

This has already been shared in a comment, but just to provide a complete-ish answer...

这已经在一条评论中被分享了,但只是为了提供一个完整的答案……

You have these tools at your disposal:

你有这些工具供你使用:

  • x matches x exactly once
  • x恰好匹配x一次
  • x{a,b} matches x between a and b times
  • x{a,b}匹配a和b次之间的x
  • x{a,} matches x at least a times
  • {a,}匹配至少一次
  • x{,b} matches x up to (a maximum of) b times
  • x{,b}匹配x至多(最多)b次
  • x* matches x zero or more times (same as x{0,})
  • x*匹配x 0或更多次(与x{0,}相同)
  • x+ matches x one or more times (same as x{1,})
  • x+匹配x 1次或多次(与x{1,}相同)
  • x? matches x zero or one time (same as x{0,1})
  • x ?匹配x 0或1次(与x{0,1}相同)

So you want to use that last one, since it's exactly what you're looking for (zero or one time).

所以你想用最后一个,因为它就是你要找的(0或1)

/\d{3}[^0-9a-zA-Z]?\d{2}[^0-9a-zA-Z]?\d{4}/

#3


2  

Your current regex will allow 123-45[6789, not to mention all kinds of Unicode characters and control characters. In the extreme case:

您当前的regex将允许123-45[6789],更不用说各种Unicode字符和控制字符了。在极端的例子:

123
45師6789

is considered a matched by your regex.

被您的regex认为是匹配的。

You can use backreference to make sure the separator is the same.

您可以使用backreference来确保分隔符是相同的。

/\d{3}([.-]?)\d{2}\1\d{4}/

[.-]? will match either ., - or nothing (due to the optional ? quantifier). Whatever matched here will be used to make sure that the second separator is the same via backreference.

(。)?将匹配任何一个,-或没有(由于可选?量词)。这里匹配的内容将用于确保第二个分隔符通过backreference是相同的。

#4


0  

Whelp... looks like I just found my own answer, but any clues for improvement would be helpful.

幼兽……看起来我找到了自己的答案,但是任何改进的线索都是有用的。

def format_ssns(string)
  string.scan(/\d{3}[^0-9a-zA-Z]{0,1}\d{2}[^0-9a-zA-Z]{1}\d{4}/).to_a
end

Seems to do the trick.

这似乎很管用。