快学Scala-第九章文件和正则表达式

知识点：

1.读取文件中的所有行，可以调用scala.io.Source对象的getLines方法：

import scala.io.Source

val source = Source.from("myfile.txt","UTF-8")

//第一个参数可以使字符串或者是java.io.File

//如果已知文件使用的是当前平台缺省的字符编码，则可以略去第二个字符编码参数

val lineIteractor = source.getLines

//结果是一个迭代器

for(l <- lineIteractor) ...

//对迭代器应用toArray或者toBuffer方法，将行放到数组或数组缓冲

val lines = source.getLines.toArray

//读取文件成一个字符串

val contents = source.mkString

使用完source对象后，需要调用close关闭。

2.读取字符

要从文件中读取单个字符，可以直接把Source对象当做迭代器，因为Source类扩展自Iterator[Char]: for (c <- source) 处理 c

如果你想看某个字符但又不处理的话，调用source对象的buffered方法。可以用head方法查看下一个字符，但同时不把它当做是已处理的字符。

val source = Source.fromFile("myfile.txt","UTF-8")

val iter = source.buffered

while(iter.hasNext){

  if(iter.head 是符合预期的)

            处理 iter.next

  else

    ...

}

source.close()

3.读取词法单元和数字

读取源文件中所有以空格隔开的词法单元： val tokens = source.mkString.split(“\\s+”)

字符串转换为数字，如果有浮点数文件，并读取到数组

val numbers = for (w <- tokens) yield w.toDouble

或者

val numbers = tokens.map(_.toDouble)

也可以从控制台读取数字，readInt() readDouble() readLong(), 这些方法假定下一行输入包含单个数字，

4.从URL或其他源读取

  val source1 = Source.fromURL("http://horstamnn.com","UTF-8")

  val source2 = Source.fromString("Hello,world！")//从给定的字符串读取 对调试很有用

  val source3 = Source.stdin //从标准输入读取

5.读取二进制文件 scala并没有提供读取二进制文件的方法，需要使用java类库

  val file = new File(filename)

  val in = new FileInputStream(file)

  val bytes = new Array[Byte](file.length.toInt)

  in.read(bytes)

  in.close()

6.写入文本文件

Scala没有内建的对写入文件的支持，要写入文件，可使用java.io.PrintWriter.

val out = new PrintWriter("numbers.txt")

  for(i <- 1 to 100) out.println(i)

  out.close()

  //当传递数字给printf方法时，编译器会抱怨说你需要将它转换成AnyRef:

  out.printf("%6d %10.2f",quantity.asInstanceOf[AnfRef],price.asInstanceOf[AnyRef])

  //为了避免这个麻烦，也可以用String类的format方法

  out.print("%6d %10.2f".format(quantity,price))

7.访问目录目前Scala并没有正式的用来访问某个目录中的所有文件，或者递归地遍历所有目录的类。

import java.io.File

 //遍历某个目录下所有子目录的函数

 def subdirs(dir:File):Iterator[File] = {

   val children = dir.listFiles.filter(_.isDirectory())

   children.toIterator ++ children.toIterator.flatMap(subdirs _)

 }

 //访问所有子目录

 for(d <- subdirs(dir)) 处理 d

8.序列化在java中，使用序列化来将对象传输到其他虚拟机，或临时存储，对于长期存储而言，序列化可能会比较笨拙。以下是如何在java和scala中声明一个可被序列化的类。

//java

  public class Person implements java.io.Serializable{

    private static final long serialVersionUID = 42L;

    ...

  }

  //scala

  @SerialVersionUID(42L) class Person extends Serializable

  //serializable特质定义在scala包，不需要显式引入，也可略去@SerialVersionUID注解

按照常规的方式进行对象的序列化和反序列化

val fred = new Person(...)

  import java.io._

  val out = new ObjectOutputStream(new FileOutputStream("/tmp/test.obj"))

  out.writeObject(fred)

  out.close()

  val in = new ObjectInputStream(new FileInputStream("/tmp/test.obj"))

  val savedFred = in.readObject().asInstanceOf[Person]

Scala集合都是可序列化的，可以把它们用作可序列化类的成员。

9.正则表达式

scala.util.matching.Regex.构造一个Regex对象，用String类的r方法：

val numPattern = “[0-9]+”.r

如果正则表达式包含反斜杠或引号的话，最好使用“原始”字符串语法”””…”””,例如：

val wsnumwsPattern = “””\s+[0-9]+\s+”””.r

findAllIn方法返回遍历所有匹配项的迭代器。可以在for循环中使用它：

for( matching <- numPattern.findAllIn(“99 bottles,98 bottles”))

处理 matchString

或者将迭代器转成数组：

val matches = numPattern.findAllIn(“99 bottles,98 bottles”))

//Array(99,98)

要找到字符串中的首个匹配项，可使用findFirstIn，得到Option[String].

val m1 = wsnumwsPattern.findFirstIn(“99 bottles,98 bottles”))

//Some(“98”)

检查是否某个字符串的开始部分能匹配，可用findPrefixOf，可以替换首个匹配项或全部替换，使用replaceFirstIn,replaceAllIn.

10.正则表达式组

val numitemPattern = “([0-9]+) ([a-z]+)”.r

val numitemPattern(num,item) = “99 bottles”//将num设为99，item设为bottles

从多个匹配项中提取分组内容：

for(numitemPattern(num,item) <- numitemPattern.findAllIn(“99 bottles,98 bottles”))

处理 num 和 item

练习：参考网址

1.编写一小段Scala代码，将某个文件中的行倒转顺序(将最后一行作为第一行,依此类推)

import scala.io.Source

import java.io.PrintWriter

object chapterNine{

val path = "test.txt"

val source = Source.fromFile(path).getLines()

val sourceRev = source.toArray.reverse

val pw = new PrintWriter(path)

sourceRev.foreach(line => pw.write(line+"\n"))

pw.close()

}

2.编写Scala程序,从一个带有制表符的文件读取内容,将每个制表符替换成一组空格,使得制表符隔开的n列仍然保持纵向对齐,并将结果写入同一个文件

import scala.io.Source

import java.io.PrintWriter

object chapterNine {

  val path = "test.txt"

  val source = Source.fromFile(path).getLines()

  val result = for (t <- source) yield t.replaceAll("\\t", "    ")

  val pw = new PrintWriter(path)

  result.foreach(line => pw.write(line + "\n"))

  pw.close()

}

3.编写一小段Scala代码,从一个文件读取内容并把所有字符数大于12的单词打印到控制台。如果你能用单行代码完成会有额外奖励

import scala.io.Source

object chapterNine {

Source.fromFile("test.txt").mkString.split("\\s+").foreach(arg => if(arg.length > 12) println(arg))

}

4.编写Scala程序，从包含浮点数的文本文件读取内容，打印出文件中所有浮点数之和，平均值，最大值和最小值

import scala.io.Source

object chapterNine {

  val nums = Source.fromFile("test.txt").mkString.split("\\s+")

  var total = 0d

  nums.foreach(total += _.toDouble)

  println(total)

  println(total/nums.length)

  println(nums.max)

  println(nums.min)

}

5.编写Scala程序，向文件中写入2的n次方及其倒数，指数n从0到20。对齐各列

1 1

2 0.5

4 0.25

… …

import java.io.PrintWriter

object chapterNine {

  val pw = new PrintWriter("test.txt")

  for(n <- 0 to 20){

    val t = BigDecimal(2).pow(n)

    pw.write(t.toString())

    pw.write("\t\t")

    pw.write((1/t).toString())

    pw.write("\n")

  }

}

6.编写正则表达式,匹配Java或C++程序代码中类似"like this,maybe with \" or\\"这样的带引号的字符串。编写Scala程序将某个源文件中所有类似的字符串打印出来

import scala.io.Source

object chapterNine {

  val source = Source.fromFile("test.txt").mkString

  val pattern = "\\w+\\s+\"".r

  pattern.findAllIn(source).foreach(println)

}

7.编写Scala程序，从文本文件读取内容，并打印出所有的非浮点数的词法单位。要求使用正则表达式

import scala.io.Source

object chapterNine {

  val source = Source.fromFile("test.txt").mkString

  val pattern = """[^((\d+\.){0,1}\d+)^\s+]+""".r

  pattern.findAllIn(source).foreach(println)

}

8.编写Scala程序打印出某个网页中所有img标签的src属性。使用正则表达式和分组

import scala.io.Source

object chapterNine {

  val source = Source.fromFile("test.txt").mkString

  val pattern = """<img[^>]+(src\s*=\s*"[^>^"]+")[^>]*>""".r

  for (pattern(str) <- pattern.findAllIn(source)) println(str)

}

9.编写Scala程序，盘点给定目录及其子目录中总共有多少以.class为扩展名的文件

import java.io.File

object chapterNine {

  val path = "."

  val dir = new File(path)

  def subdirs(dir: File): Iterator[File] = {

    val children = dir.listFiles().filter(_.getName.endsWith("class"))

    children.toIterator ++ dir.listFiles().filter(_.isDirectory).toIterator.flatMap(subdirs _)

  }

  val n = subdirs(dir).length

  println(n)

}

10.扩展那个可序列化的Person类，让它能以一个集合保存某个人的朋友信息。构造出一些Person对象，让他们中的一些人成为朋友，然后将Array[Person]保存到文件。将这个数组从文件中重新读出来，校验朋友关系是否完好.

import collection.mutable.ArrayBuffer

import java.io.{ObjectInputStream, FileOutputStream, FileInputStream, ObjectOutputStream}

//注意,请在main中执行。脚本执行无法序列化。

class Person(var name:String) extends Serializable{

  val friends = new ArrayBuffer[Person]()

  def addFriend(friend : Person){

    friends += friend

  }

  override def toString() = {

    var str = "My name is " + name + " and my friends name is "

    friends.foreach(str += _.name + ",")

    str

  }

}

object chapterNine extends App{

  val p1 = new Person("Ivan")

  val p2 = new Person("F2")

  val p3 = new Person("F3")

  p1.addFriend(p2)

  p1.addFriend(p3)

  println(p1)

  val out = new ObjectOutputStream(new FileOutputStream("test.txt"))

  out.writeObject(p1)

  out.close()

  val in =  new ObjectInputStream(new FileInputStream("test.txt"))

  val p = in.readObject().asInstanceOf[Person]

  println(p)

}

秒客网

快学Scala-第九章文件和正则表达式

相关文章

快学Scala-第九章 文件和正则表达式

相关文章

快学Scala-第九章文件和正则表达式