Apache中的聚合器使用数据流运行器

时间:2021-10-16 15:11:08

I am trying to create aggregators to count values that satisfy a condition across all input data . I looked into documentation and found the below for creation .

我正在尝试创建聚合器来计算满足所有输入数据条件的值。我查看了文档,发现了下面的创建。

https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/Aggregator ..

https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/Aggregator ..

I am using : google-cloud-dataflow-java-sdk-all - 2.4.0 (apache beam based)

我正在使用:google-cloud-dataflow-java-sdk-all - 2.4.0(基于apache beam)

However I am not able to find the corresponding class in the new beam api.. I looked into org.apache.beam.sdk.transforms package .

但是我无法在新的光束api中找到相应的类。我查看了org.apache.beam.sdk.transforms包。

Can you please let me know how can I use aggregators with dataflow runner in new api . ?

能告诉我如何在新api中使用带有dataflow runner的聚合器。 ?

2 个解决方案

#1


1  

The link you have is for the old SDK (1.x).

您拥有的链接是旧SDK(1.x)。

In SDK 2.x, you should refer to apache-beam SDK. For the Aggregators you mentioned, if I understand correctly, it's for adding counters during processing. I guess the corresponding package should be org.apache.beam.sdk.metrics.

在SDK 2.x中,您应该参考apache-beam SDK。对于您提到的聚合器,如果我理解正确的话,那就是在处理过程中添加计数器。我猜对应的包应该是org.apache.beam.sdk.metrics。

Package org.apache.beam.sdk.metrics Metrics allow exporting information about the execution of a pipeline.

包org.apache.beam.sdk.metrics指标允许导出有关管道执行的信息。

and org.apache.beam.sdk.metrics.Counter interface:

和org.apache.beam.sdk.metrics.Counter接口:

A metric that reports a single long value and can be incremented or decremented.

报告单个长值并可递增或递减的度量标准。

#2


0  

As of now, there seem to be no replacement for the Aggregator class in Apache Beam SDK 2.X. An alternate solution to count values respecting a condition would be Transforms. By using the GroupBy transform to collect data meeting a condition and then the Combine transform, you can get a count of the input data respecting the condition.

截至目前,似乎没有替代Apache Beam SDK 2.X中的Aggregator类。计算关于条件的值的替代解决方案是变换。通过使用GroupBy变换来收集满足条件的数据,然后使用Combine变换,您可以获得关于条件的输入数据的计数。

#1


1  

The link you have is for the old SDK (1.x).

您拥有的链接是旧SDK(1.x)。

In SDK 2.x, you should refer to apache-beam SDK. For the Aggregators you mentioned, if I understand correctly, it's for adding counters during processing. I guess the corresponding package should be org.apache.beam.sdk.metrics.

在SDK 2.x中,您应该参考apache-beam SDK。对于您提到的聚合器,如果我理解正确的话,那就是在处理过程中添加计数器。我猜对应的包应该是org.apache.beam.sdk.metrics。

Package org.apache.beam.sdk.metrics Metrics allow exporting information about the execution of a pipeline.

包org.apache.beam.sdk.metrics指标允许导出有关管道执行的信息。

and org.apache.beam.sdk.metrics.Counter interface:

和org.apache.beam.sdk.metrics.Counter接口:

A metric that reports a single long value and can be incremented or decremented.

报告单个长值并可递增或递减的度量标准。

#2


0  

As of now, there seem to be no replacement for the Aggregator class in Apache Beam SDK 2.X. An alternate solution to count values respecting a condition would be Transforms. By using the GroupBy transform to collect data meeting a condition and then the Combine transform, you can get a count of the input data respecting the condition.

截至目前,似乎没有替代Apache Beam SDK 2.X中的Aggregator类。计算关于条件的值的替代解决方案是变换。通过使用GroupBy变换来收集满足条件的数据,然后使用Combine变换,您可以获得关于条件的输入数据的计数。