SortValues转换为仅在hadoop环境中运行的Beam中的Java SDK扩展吗?

时间:2021-11-21 15:33:25

I have tried the example code of SortValues transform using DirectRunner on local machine (Windows)

我在本地机器上尝试使用DirectRunner进行SortValues转换的示例代码(Windows)

PCollection<KV<String, KV<String, Integer>>> input = ...

PCollection<KV<String, Iterable<KV<String, Integer>>>> grouped =
input.apply(GroupByKey.<String, KV<String, Integer>>create());

PCollection<KV<String, Iterable<KV<String, Integer>>>> groupedAndSorted =
grouped.apply(SortValues.<String, String, Integer>create(BufferedExternalSorter.options()));

but I got the error PipelineExecutionException: java.lang.NoClassDefFoundError: org/apache/hadoop/io/Writable. Does this mean this transform function only works in Hadoop environment?

但我收到错误PipelineExecutionException:java.lang.NoClassDefFoundError:org / apache / hadoop / io / Writable。这是否意味着此转换功能仅适用于Hadoop环境?

1 个解决方案

#1


1  

As of today, if you use Beam with release version below 2.0.0, you will have to add two hadoop dependencies in your maven pom file for this SortValues module to work.

截至今天,如果您使用版本低于2.0.0的版本的Beam,则必须在maven pom文件中添加两个hadoop依赖项才能使此SortValues模块正常工作。

  1. add hadoop-common version 2.7.3 or later
  2. 添加hadoop-common版本2.7.3或更高版本
  3. add hadoop-mapreduce-client-core version 2.7.3 or later.
  4. 添加hadoop-mapreduce-client-core版本2.7.3或更高版本。

Otherwise, you will just need to use Beam with release version >= 2.0.0.

否则,您只需要使用发布版本> = 2.0.0的Beam。

#1


1  

As of today, if you use Beam with release version below 2.0.0, you will have to add two hadoop dependencies in your maven pom file for this SortValues module to work.

截至今天,如果您使用版本低于2.0.0的版本的Beam,则必须在maven pom文件中添加两个hadoop依赖项才能使此SortValues模块正常工作。

  1. add hadoop-common version 2.7.3 or later
  2. 添加hadoop-common版本2.7.3或更高版本
  3. add hadoop-mapreduce-client-core version 2.7.3 or later.
  4. 添加hadoop-mapreduce-client-core版本2.7.3或更高版本。

Otherwise, you will just need to use Beam with release version >= 2.0.0.

否则,您只需要使用发布版本> = 2.0.0的Beam。