Google Dataflow - 将数据保存到多个BigQuery表中

时间:2021-11-09 14:29:10

I’m using Google Dataflow 1.9 to save data into BigQuery tables. I'm looking for a way to control the table name into which a (PCollection) element is written, based on some value in that element. In our case, the elements contain a user-id, and we wish to write each to it's own user table, dynamically.

我正在使用Google Dataflow 1.9将数据保存到BigQuery表中。我正在寻找一种方法来控制写入(PCollection)元素的表名,基于该元素中的某个值。在我们的例子中,元素包含一个user-id,我们希望动态地将每个元素写入它自己的用户表。

1 个解决方案

#1


1  

With 1.9.0 the only options are to either (1) partition the elements into multiple output collections, and then write each output collection to a specific table or (2) window the elements and select the destination based on the window. Option 1 will only work if there is a relatively small set of destination tables and option 2 will only work if the decision is based on the window, which won't fit your use case of per-user destinations very

使用1.9.0时,唯一的选择是:(1)将元素分区为多个输出集合,然后将每个输出集合写入特定表,或者(2)窗口元素并根据窗口选择目标。选项1仅在存在相对较小的目标表集时才起作用,而选项2仅在决策基于窗口时才起作用,这不适合您的每个用户目的地的用例

If you upgrade to 2.0.0 the destination may be specified by a function that receives the window and data element, using either DynamicDestinations or a SerializableFunction. This would allow you to receive each element and then choose the destination based on the user ID.

如果升级到2.0.0,则目标可以由接收窗口和数据元素的函数指定,使用DynamicDestinations或SerializableFunction。这将允许您接收每个元素,然后根据用户ID选择目标。

#1


1  

With 1.9.0 the only options are to either (1) partition the elements into multiple output collections, and then write each output collection to a specific table or (2) window the elements and select the destination based on the window. Option 1 will only work if there is a relatively small set of destination tables and option 2 will only work if the decision is based on the window, which won't fit your use case of per-user destinations very

使用1.9.0时,唯一的选择是:(1)将元素分区为多个输出集合,然后将每个输出集合写入特定表,或者(2)窗口元素并根据窗口选择目标。选项1仅在存在相对较小的目标表集时才起作用,而选项2仅在决策基于窗口时才起作用,这不适合您的每个用户目的地的用例

If you upgrade to 2.0.0 the destination may be specified by a function that receives the window and data element, using either DynamicDestinations or a SerializableFunction. This would allow you to receive each element and then choose the destination based on the user ID.

如果升级到2.0.0,则目标可以由接收窗口和数据元素的函数指定,使用DynamicDestinations或SerializableFunction。这将允许您接收每个元素,然后根据用户ID选择目标。