ParDo与Apache Beam中的FlatMap相比?

时间:2021-11-09 14:29:40

Is there a difference between ParDo and FlatMap in Dataflow / Apache Beam?

Parflow和FlatMap在Dataflow / Apache Beam中有区别吗?

I think both apply a function to each element of the incoming PCollection, and return the iterable; but I imagine there must be some difference?

我认为两者都将一个函数应用于传入的PCollection的每个元素,并返回iterable;但我想有必要有一些区别?

1 个解决方案

#1


6  

FlatMap is a simpler operation built as you might expect from ParDo. If this fits your needs, it is a good choice.

FlatMap是一种更简单的操作,可以像ParDo一样构建。如果这符合您的需求,那么这是一个不错的选择。

ParDo is a lower-level building block of element-wise computation that has additional capabilities like side inputs, multiple output collections, access to the current window, some really low level callbacks for starting and committing bundle of elements, and more.

ParDo是元素计算的低级构建块,具有附加功能,如侧输入,多输出集合,对当前窗口的访问,一些用于启动和提交元素束的低级回调等等。

In practice, many uses of FlatMap and ParDo end up with a similar code bulk, but in my opinion it is most readable to use the simplest (highest level) transform available.

实际上,FlatMap和ParDo的许多用法最终都有类似的代码批量,但在我看来,使用最简单(*别)的转换是最可读的。

#1


6  

FlatMap is a simpler operation built as you might expect from ParDo. If this fits your needs, it is a good choice.

FlatMap是一种更简单的操作,可以像ParDo一样构建。如果这符合您的需求,那么这是一个不错的选择。

ParDo is a lower-level building block of element-wise computation that has additional capabilities like side inputs, multiple output collections, access to the current window, some really low level callbacks for starting and committing bundle of elements, and more.

ParDo是元素计算的低级构建块,具有附加功能,如侧输入,多输出集合,对当前窗口的访问,一些用于启动和提交元素束的低级回调等等。

In practice, many uses of FlatMap and ParDo end up with a similar code bulk, but in my opinion it is most readable to use the simplest (highest level) transform available.

实际上,FlatMap和ParDo的许多用法最终都有类似的代码批量,但在我看来,使用最简单(*别)的转换是最可读的。