如何从Scala中的较大字符串中提取有效电子邮件

时间:2022-09-13 11:28:55

My scala version 2.7.7

我的scala版本2.7.7

Im trying to extract an email adress from a larger string. the string itself follows no format. the code i've got:

我试图从更大的字符串中提取电子邮件地址。字符串本身不遵循任何格式。我得到的代码:

import scala.util.matching.Regex
import scala.util.matching._
val Reg = """\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b""".r
"yo my name is joe : joe@gmail.com" match {
    case Reg(e) => println("match: " + e)
    case _ => println("fail")
}

the Regex passes in RegExBuilder but does not pass for scala. Also if there is another way to do this without regex that would be fine also. Thanks!

正则表达式在RegExBuilder中传递但不传递scala。如果有另一种方法来做这个没有正则表达式也可以。谢谢!

3 个解决方案

#1


6  

As Alan Moore pointed out, you need to add the (?i) to the beginning of the pattern to make it case-insensitive. Also note that using the Regex directly matches the whole string. If you want to find one within a larger string, you can call findFirstIn() or use one of the similar methods of Regex.

正如Alan Moore指出的那样,您需要将(?i)添加到模式的开头以使其不区分大小写。另请注意,使用正则表达式直接匹配整个字符串。如果要在较大的字符串中找到一个,可以调用findFirstIn()或使用Regex的类似方法之一。

val reg = """(?i)\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b""".r
reg findFirstIn "yo my name is joe : joe@gmail.com"  match {
    case Some(email) => println("match: " + email)
    case None => println("fail")
}

#2


3  

It looks like you're trying to do a case-insensitive search, but you aren't specifying that anywhere. Try adding (?i) to the beginning of the regex:

看起来你正在尝试进行不区分大小写的搜索,但是你没有在任何地方指定它。尝试将(?i)添加到正则表达式的开头:

"""(?i)\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b""".r

#3


1  

Well, the ways to do it other than REs are probably a lot messier. The next step up would probably the a combinator parser. A lot of random string dissection code would be even more general and almost certainly a whole lot more painful. In part what's a suitable tactic depends on how complete (and how strict or lenient) your recognizer needs to be. E.g., the common form: Rudolf Reindeer <rudy.caribou@north_pole.rth> is not accepted by your RE (even after the case-sensitivity is relaxed). Full-blown RFC 2822 address parsing is rather challenging for an RE-based approach.

那么,除RE之外的其他方法可能会更加混乱。下一步可能是组合器解析器。很多随机字符串解剖代码会更加普遍,而且几乎肯定会更加痛苦。在某种程度上,合适的策略取决于你的识别器需要多么完整(以及多么严格或宽松)。例如,您的RE不接受常见形式:Rudolf Reindeer (即使在区分大小写后也是如此)。对于基于RE的方法,完整的RFC 2822地址解析相当具有挑战性。 @north_pole.rth>

#1


6  

As Alan Moore pointed out, you need to add the (?i) to the beginning of the pattern to make it case-insensitive. Also note that using the Regex directly matches the whole string. If you want to find one within a larger string, you can call findFirstIn() or use one of the similar methods of Regex.

正如Alan Moore指出的那样,您需要将(?i)添加到模式的开头以使其不区分大小写。另请注意,使用正则表达式直接匹配整个字符串。如果要在较大的字符串中找到一个,可以调用findFirstIn()或使用Regex的类似方法之一。

val reg = """(?i)\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b""".r
reg findFirstIn "yo my name is joe : joe@gmail.com"  match {
    case Some(email) => println("match: " + email)
    case None => println("fail")
}

#2


3  

It looks like you're trying to do a case-insensitive search, but you aren't specifying that anywhere. Try adding (?i) to the beginning of the regex:

看起来你正在尝试进行不区分大小写的搜索,但是你没有在任何地方指定它。尝试将(?i)添加到正则表达式的开头:

"""(?i)\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b""".r

#3


1  

Well, the ways to do it other than REs are probably a lot messier. The next step up would probably the a combinator parser. A lot of random string dissection code would be even more general and almost certainly a whole lot more painful. In part what's a suitable tactic depends on how complete (and how strict or lenient) your recognizer needs to be. E.g., the common form: Rudolf Reindeer <rudy.caribou@north_pole.rth> is not accepted by your RE (even after the case-sensitivity is relaxed). Full-blown RFC 2822 address parsing is rather challenging for an RE-based approach.

那么,除RE之外的其他方法可能会更加混乱。下一步可能是组合器解析器。很多随机字符串解剖代码会更加普遍,而且几乎肯定会更加痛苦。在某种程度上,合适的策略取决于你的识别器需要多么完整(以及多么严格或宽松)。例如,您的RE不接受常见形式:Rudolf Reindeer (即使在区分大小写后也是如此)。对于基于RE的方法,完整的RFC 2822地址解析相当具有挑战性。 @north_pole.rth>