spark streaming - 在一个流中创建tmp视图,在另一个流中使用

时间:2020-12-19 19:13:59

I'm trying to run 2 Dstreams, generate Dataframe in the first one register the df as tmp view and then use it in another Dstream as follows:

我正在尝试运行2个Dstream,在第一个中生成Dataframe,将df注册为tmp视图,然后在另一个Dstream中使用它,如下所示:

dstream1.foreachRDD { rdd =>
  import org.apache.spark.sql._
  val spark = SparkSession.builder.config(rdd.sparkContext.getConf).getOrCreate
  import spark.implicits._
  import spark.sql

  val records = rdd.toDF("record")
  records.createOrReplaceTempView("records")
}
dstream2.foreachRDD { rdd =>
  import org.apache.spark.sql._
  val spark = SparkSession.builder.config(rdd.sparkContext.getConf).getOrCreate
  import spark.implicits._
  import spark.sql

  val records2 = rdd.toDF("record2")
  val oldRecord = spark.table("records")
  records2.join(oldRecod).write.json(...)
}
streamingContext.remember(Seconds(60))
    streamingContext.start()
    streamingContext.awaitTermination()

I'm keep getting an org.apache.spark.sql.catalyst.analysis.NoSuchTableException so obviously I'm not doing something right.

我一直在得到一个org.apache.spark.sql.catalyst.analysis.NoSuchTableException,所以显然我没有做正确的事情。

is there a way to make this happen?

有没有办法实现这一目标?

Thanks!

1 个解决方案

#1


0  

this actually worked, the problem was that when tested locally you need to leave extra core for calculation other then bringing the data from the stream.

这实际上有效,问题是当在本地测试时,你需要留下额外的核心进行计算,然后从流中提取数据。

I used master = local[2] thus each core is used to process each stream and non left to do anything else. once I changed it to master = local[4] it worked fine

我使用master = local [2]因此每个核心用于处理每个流而非左边用于执行任何其他操作。一旦我把它改成master = local [4]它就运行良好

#1


0  

this actually worked, the problem was that when tested locally you need to leave extra core for calculation other then bringing the data from the stream.

这实际上有效,问题是当在本地测试时,你需要留下额外的核心进行计算,然后从流中提取数据。

I used master = local[2] thus each core is used to process each stream and non left to do anything else. once I changed it to master = local[4] it worked fine

我使用master = local [2]因此每个核心用于处理每个流而非左边用于执行任何其他操作。一旦我把它改成master = local [4]它就运行良好