无法运行hadoop流媒体作业:缺少必需的选项:输入,输出

时间:2022-05-08 15:38:18

I'm trying to run streaming job on cluster of DSE 3.1 analytics servers. I'm using Cassandra CFs for input. But it complains about input and output parameters, but they were set (I've set it just because of complaining):

我正在尝试在DSE 3.1分析服务器集群上运行流媒体作业。我正在使用Cassandra CFs进行输入。但它抱怨输入和输出参数,但它们被设置(我设置它只是因为抱怨):

dse hadoop jar $HADOOP_HOME/lib/hadoop-streaming-1.0.4.8.jar \
-D cassandra.input.keyspace="tmp_ks" \
-D cassandra.input.partitioner.class="MurMur3Partitioner" \
-D cassandra.input.columnfamily="tmp_cf" \
-D cassandra.consistencylevel.read="ONE" \
-D cassandra.input.widerows=true \
-D cassandra.input.thrift.address=10.0.0.1
-inputformat org.apache.cassandra.hadoop.ColumnFamilyInputFormat \
-outputformat org.apache.hadoop.mapred.lib.NullOutputFormat \
-input /tmp_ks/tmp_cf \
-output /dev/null \
-mapper mymapper.py \
-reducer myreducer.py

Got "ERROR streaming.StreamJob: Missing required options: input, output". I've tried different inputs and outputs, different outputformats but got the same error.

得到了“ERROR streaming.StreamJob:缺少必需的选项:输入,输出”。我尝试了不同的输入和输出,不同的输出格式,但得到了相同的错误。

What I've done wrong?

我做错了什么?

3 个解决方案

#1


2  

I notice that this part of your command doesn't have a trailing backslash:

我注意到你的命令的这一部分没有反斜杠:

...
-D cassandra.input.thrift.address=10.0.0.1
...

Maybe that's screwing up the lines that follow?

也许这搞砸了后面的界限?

#2


1  

Input should be an existing path on HDFS, while output should be a non-existing path on HDFS

输入应该是HDFS上的现有路径,而输出应该是HDFS上不存在的路径

#3


0  

I also noticed this wrong with your command:

我也注意到你的命令错了:

...    
-D cassandra.input.partitioner.class="MurMur3Partitioner" \
...

The class should be "Murmur3Partitioner"

这堂课应该是“Murmur3Partitioner”

#1


2  

I notice that this part of your command doesn't have a trailing backslash:

我注意到你的命令的这一部分没有反斜杠:

...
-D cassandra.input.thrift.address=10.0.0.1
...

Maybe that's screwing up the lines that follow?

也许这搞砸了后面的界限?

#2


1  

Input should be an existing path on HDFS, while output should be a non-existing path on HDFS

输入应该是HDFS上的现有路径,而输出应该是HDFS上不存在的路径

#3


0  

I also noticed this wrong with your command:

我也注意到你的命令错了:

...    
-D cassandra.input.partitioner.class="MurMur3Partitioner" \
...

The class should be "Murmur3Partitioner"

这堂课应该是“Murmur3Partitioner”