为什么我不能显示Spark MultilayerPerceptronClassifier的预测列?

时间:2022-06-22 02:32:03

I am using Spark's MultilayerPerceptronClassifier. This generates a column 'predicted' in 'predictions'. When I try to show it I get the error:

我使用Spark的多层感知机分类器。这就产生了“预测”中的“预测”列。当我试图展示它时,我得到了错误:

SparkException: Failed to execute user defined function($anonfun$1: (vector) => double) ...
Caused by: java.lang.IllegalArgumentException: requirement failed: A & B Dimension mismatch!

Other columns, for example, vector display OK. Part of predictions schema:

例如,其他列向量显示OK。预测模式的一部分:

|-- vector: vector (nullable = true)
|-- prediction: double (nullable = true)

My code is:

我的代码是:

//racist is boolean, needs to be string:
val train2 = train.withColumn("racist", 'racist.cast("String"))
val test2 = test.withColumn("racist", 'racist.cast("String"))

val indexer = new StringIndexer().setInputCol("racist").setOutputCol("indexracist")

val word2Vec = new Word2Vec().setInputCol("lemma").setOutputCol("vector") //.setVectorSize(3).setMinCount(0)

val layers = Array[Int](4,5, 2)

val mpc = new MultilayerPerceptronClassifier().setLayers(layers).setBlockSize(128).setSeed(1234L).setMaxIter(100).setFeaturesCol("vector").setLabelCol("indexracist")

val pipeline = new Pipeline().setStages(Array(indexer, word2Vec, mpc))

val model = pipeline.fit(train2)

val predictions = model.transform(test2)

predictions.select("prediction").show()

EDIT the proposed similar question's problem was

编辑提出的类似问题是。

val layers = Array[Int](0, 0, 0, 0) 

which is not the case here, nor is it the same error.

这里不是这样,也不是同一个错误。

EDIT AGAIN: part0 of train and test are saved in PARQUET format here.

再次编辑:在这里保存了火车和测试的part0。

1 个解决方案

#1


3  

The addition of .setVectorSize(3).setMinCount(0) and changing val layers = Array[Int](3,5, 2) made it work:

添加.setVectorSize(3).setMinCount(0)和变化的val layers =数组[Int](3,5, 2)使其工作:

val word2Vec = new Word2Vec().setInputCol("lemma").setOutputCol("vector").setVectorSize(3).setMinCount(0)

// specify layers for the neural network:
// input layer of size 4 (features), two intermediate of size 5 and 4
// and output of size 3 (classes)
val layers = Array[Int](3,5, 2)

#1


3  

The addition of .setVectorSize(3).setMinCount(0) and changing val layers = Array[Int](3,5, 2) made it work:

添加.setVectorSize(3).setMinCount(0)和变化的val layers =数组[Int](3,5, 2)使其工作:

val word2Vec = new Word2Vec().setInputCol("lemma").setOutputCol("vector").setVectorSize(3).setMinCount(0)

// specify layers for the neural network:
// input layer of size 4 (features), two intermediate of size 5 and 4
// and output of size 3 (classes)
val layers = Array[Int](3,5, 2)