scala 之正则匹配

时间:2024-03-07 16:10:34

一、String.matches()  ## 用于过滤需要处理的日志(如空格空行错误字符)

语句:
"!123".matches("[a-zA-Z0-9]{4}")  //false
"34Az".matches("[a-zA-Z0-9]{4}")  //true
// 应用:

// 1.scala读取log
  def readFromTxt(filePath:String): Array[String] ={
      import scala.io.Source
      val source = Source.fromFile(filePath,"UTF-8")
      val lines = source.getLines().toArray
      source.close()
      lines
  }
//2. 应用于过滤日志需要的信息
// regex里三个""",就不需要转义了! val reg = """([A-Z]+) ([0-9]{4}-[0-9]{1,2}-[0-9]{1,2}) requestURI:(.*)""".r // 先过滤空格,再map lines.filter(_.matches("""([A-Z]+) ([0-9]{4}-[0-9]{1,2}-[0-9]{1,2}) requestURI:(.*)""")) .map(line=>line match{ case reg(level,logdate,addr)=>(level,logdate,addr) }).foreach(println(_))

----补充LOG日志-----
INFO 2000-10-01 requestURI:/c?app=0&p=1&did=180042334&industry=45Z

INFO 2012-11-11 requestURI:/c?app=2&p=3&did=140042334&industry=42Z
WARN 2012-11-11 requestURI:/c?app=2&p=3&did=140042334&industry=42Z
ERROR 2012-11-11 requestURI:/c?app=2&p=3&did=140042334&industry=42Z

二、case模式匹配(推荐使用,最方便)

模式匹配/模式守卫/类型匹配:https://blog.csdn.net/lyq7269/article/details/107759026

例1

// 语句1:
val pattern = "([a-zA-Z][0-9][a-zA-Z] [0-9][a-zA-Z][0-9])".r
"L3R 6M2" match {
    case pattern(x) => println("Valid zip-code: " + x )  //x为第1个分组结果,可以匹配多个分组
    case x => println("Invalid zip-code: " + x )
} 
// 语句2:
val date = """(\d\d\d\d)-(\d\d)-(\d\d)""".r
"2014-05-23" match {
    case date(year, month, day) => println(year,month,day)
}
"2014-05-23" match {
    case date(year, _*) => println("The year of the date is " + year) 
} 
"2014-05-23" match {
    case date(_*) => println("It is a date")
}

例2

val reg = """.* set se[0-9]_([0-9]+)_([0-9]+)_([0-9]+)r (.*),.*""".r
rdd.foreach {
      case reg(zs, stu, ques, sa) => println(zs, stu, ques,sa)
    }

匹配log如下,取红色字段

2019-06-16 14:24:34 INFO com.noriental.praxissvr.answer.util.PraxisSsdbUtil:45 [SimpleAsyncTaskExecutor-1] [020765925160] req: set se0_34434412_8195023659593_80801,resp: ok 14

注意点:使用模式匹配虽然方便,但是要注意reg中的括号一定不能镶嵌,比如匹配整数or小数时, ([0-9](\.[0-9])?) 会因为找不到哪个括号而报错!最好使用 (.*) 

三、import scala.util.matching.Regex API

1)findFirstMatchIn() 返回第一个匹配(Option[Match])

语句:
import scala.util.matching.Regex
val numberPattern: Regex = "[0-9]".r
numberPattern.findFirstMatchIn("awesomepassword") match {
  case Some(_) => println("Password OK")  //匹配成功
  case None => println("Password must contain a number")   //未匹配
}

2)分组处理
findAllMatchIn().toList => List[Regex.Match]

例1

语句2:
import scala.util.matching.Regex

val studentPattern:Regex="([0-9a-zA-Z-#() ]+):([0-9a-zA-Z-#() ]+)".r
val input="name:Jason,age:19,weight:100"

for(patternMatch<-studentPattern.findAllMatchIn(input)){
    println(s"key: ${patternMatch.group(1)} value: ${patternMatch.group(2)}")
}

 例2

rdd.map(line=>{
      val reg = """.* set se[0-9]_([0-9]+)_([0-9]+)_([0-9]+)r ([0-9](\.[0-9])?),.*""".r
      reg.findAllMatchIn(line).map(x=>(x.group(1),x.group(2),x.group(3),x.group(4))
        .productIterator.mkString("\t")).mkString("")
    }).foreach(println(_))

匹配log如下,取红色字段

2019-06-16 14:24:34 INFO com.noriental.praxissvr.answer.util.PraxisSsdbUtil:45 [SimpleAsyncTaskExecutor-1] [020765925160] req: set se0_34434412_8195023659593_8080r 1,resp: ok 14

3)字符串处理

1.字符串中替换
replaceFirstIn("长字符串","需要替换成什么字符")
replaceAllIn("长字符串","需要替换成什么字符")

语句1:
"[0-9]+".r.replaceFirstIn("234 Main Street Suite 2034", "567") //234->567   
"[0-9]+".r.replaceAllIn("234 Main Street Suite 2034", "567") //234、2034->567

2.

字符串中查找:findAllIn().toList => list[String]

字符串中查找:_用来扔掉不需要的数据,_*用于句末

语句1:
val nums = "[0-9]+".r.findAllIn("123 Main Street Suite 2012").toList.foreach(println(_))
语句2:
val date = """(\d\d\d\d)-(\d\d)-(\d\d)""".r
"2014-05-23" match {
    case date(year, month, day) => println(year,month,day)
}
"2014-05-23" match {
    case date(year, _*) => println("The year of the date is " + year) 
} 
"2014-05-23" match {
    case date(_*) => println("It is a date")
}