Spark-Cache与Checkpoint

时间:2023-03-09 08:33:01
Spark-Cache与Checkpoint

一、Cache缓存操作

scala> val rdd1 = sc.textFile("hdfs://192.168.146.111:9000/logs")
rdd1: org.apache.spark.rdd.RDD[String] = hdfs://192.168.146.111:9000/logs MapPartitionsRDD[38] at textFile at <console>:24 scala> rdd1.count
res13: Long = 40155 scala> rdd1.count
res14: Long = 40155 scala> val rdd2 = sc.textFile("hdfs://192.168.146.111:9000/logs")
rdd2: org.apache.spark.rdd.RDD[String] = hdfs://192.168.146.111:9000/logs MapPartitionsRDD[40] at textFile at <console>:24 scala> val rdd2Cache = rdd2.cache
rdd2Cache: rdd2.type = hdfs://192.168.146.111:9000/logs MapPartitionsRDD[40] at textFile at <console>:24 scala> rdd2Cache.count
res15: Long = 40155 scala> rdd2Cache.count
res16: Long = 40155 scala> rdd2Cache.count
res17: Long = 40155

Spark-Cache与Checkpoint

二、Checpoint机制

scala> sc.setCheckpointDir("hdfs://192.168.146.111:9000/chechdir")

scala> val rddc = rdd1.filter(_.contains("bigdata"))
rddc: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[41] at filter at <console>:26 scala> rddc.checkpoint scala> rddc.count
res21: Long = 7155

Spark-Cache与Checkpoint