Spark RDD任务一直处于RUNNING状态而没有数据处理？

Picture：Some of the Spark RDD tasks are in RUNNING status all the time without data processing

图片:某些Spark RDD任务一直处于RUNNING状态,没有数据处理

the Spark Stage is reading data from MongoDB with newAPIHadoopRDD interface, most of the tasks finished, but two tasks are in RUNNING status throughout, and the CPU and Memory of the executor is in low occupied. And I don't think it's related with Mongodb because another job reading kafka streaming has the similar behavior. What's the problem?

Spark Stage正在使用newAPIHadoopRDD接口从MongoDB读取数据,大部分任务已完成,但两个任务始终处于RUNNING状态,执行程序的CPU和内存处于低占用状态。而且我不认为这与Mongodb有关,因为读取kafka流的另一份工作也有类似的行为。有什么问题?

1 个解决方案

#1

I think I find the reasons. I added a static member object with init function in one Serializable class A, and one another Serializable class B instance uses the static member function F1 of the above class A, when I chang F1 to one unserializable class the problem disappeared. So I guess it's a timing sequence problem: when calling F1 the class A is not fulfilled with serializable process.

我想我找到了原因。我在一个Serializable类A中添加了一个带有init函数的静态成员对象,另一个Serializable类B实例使用了上面A类的静态成员函数F1,当我将F1转换为一个不可序列化的类时,问题就消失了。所以我猜这是一个时序问题:当调用F1时,A类不能用可序列化的过程来完成。

秒客网

Spark RDD任务一直处于RUNNING状态而没有数据处理？

1 个解决方案

#1

#1

相关文章