Python中Apache Spark的输出Dstream。

时间:2023-02-09 20:49:49

I 'm trying the technologies that i will be using to build a real-time data pipeline, and i have run into some issues exporting my contents to a file.

我正在尝试我将要使用的技术来构建一个实时的数据管道,并且我遇到了一些将我的内容导出到文件中的问题。

I have setup a local kafka cluster, and a node.js producer that sends a simple text message just to test functionality and get a rough estimate of complexity of implementation.

我设置了一个本地kafka集群和一个节点。js生成器只发送一个简单的文本消息以测试功能,并对实现的复杂性进行粗略估计。

This is the spark streaming job that is reading from kafka and i am trying to get it to write to a file.

这就是从kafka中读取的spark流媒体作业,我试图让它写入一个文件。

from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils

# Create a local StreamingContext with two working thread and batch interval of 1 second
sc = SparkContext("local[2]", "KafkaStreamingConsumer")
ssc = StreamingContext(sc, 10)

kafkaStream = KafkaUtils.createStream(ssc, "localhost:2181", "consumer-group", {"test": 1})

kafkaStream.saveAsTextFile('out.txt')

print 'Event recieved in window: ', kafkaStream.pprint()

ssc.start()
ssc.awaitTermination()

The error i am seeing when submitting the spark job is:

提交spark作业时,我看到的错误是:

kafkaStream.saveAsTextFile('out.txt')
AttributeError: 'TransformedDStream' object has no attribute 'saveAsTextFile'

No computations or transformations are performed on the data, i just want to build the flow. What do I need to change/add to be able to export the data in a file?

没有对数据执行计算或转换,我只想构建流。我需要更改/添加什么才能将数据导出到文件中?

1 个解决方案

#1


9  

http://spark.apache.org/docs/latest/api/python/pyspark.streaming.html

http://spark.apache.org/docs/latest/api/python/pyspark.streaming.html

saveAsTextFiles (note the plural)

saveAsTextFiles(注意复数)

saveAsTextFile (singular) is a method on an RDD, not a DStream.

saveAsTextFile(单数)是一个RDD方法,而不是DStream。

#1


9  

http://spark.apache.org/docs/latest/api/python/pyspark.streaming.html

http://spark.apache.org/docs/latest/api/python/pyspark.streaming.html

saveAsTextFiles (note the plural)

saveAsTextFiles(注意复数)

saveAsTextFile (singular) is a method on an RDD, not a DStream.

saveAsTextFile(单数)是一个RDD方法,而不是DStream。