在apache beam中编写内连接的最佳和最有效的方法是什么?

时间:2021-12-01 14:19:03

suppose my query is: "select b.* from sourav_test.test1 a inner join sourav_test.test2 b on a.id=b.id". I need the best and efficient approach for apache beam to write this.

假设我的查询是:“选择b。*来自sourav_test.test1,内部连接sourav_test.test2 b on a.id = b.id”。我需要最好和最有效的apache beam方法来编写它。

1 个解决方案

#1


0  

In Apache Beam SDK 2.5 a great approach is using the join library which performs SQL like joins. In the case of inner joins, the syntax would be as follows:

在Apache Beam SDK 2.5中,一个很好的方法是使用连接库来执行类似连接的SQL。在内连接的情况下,语法如下:

innerJoin(PCollection<KV<K,V1>> leftCollection,PCollection<KV<K,V2>> rightCollection)

Relating to your case, the left and side collections represents the collections to be inner joined. The K value would be the type of the key related to both collections. The Vs would represent the values of each collection respectively.

与您的案例相关,左侧和侧面集合表示要内部连接的集合。 K值将是与两个集合相关的密钥的类型。 Vs分别代表每个集合的值。

#1


0  

In Apache Beam SDK 2.5 a great approach is using the join library which performs SQL like joins. In the case of inner joins, the syntax would be as follows:

在Apache Beam SDK 2.5中,一个很好的方法是使用连接库来执行类似连接的SQL。在内连接的情况下,语法如下:

innerJoin(PCollection<KV<K,V1>> leftCollection,PCollection<KV<K,V2>> rightCollection)

Relating to your case, the left and side collections represents the collections to be inner joined. The K value would be the type of the key related to both collections. The Vs would represent the values of each collection respectively.

与您的案例相关,左侧和侧面集合表示要内部连接的集合。 K值将是与两个集合相关的密钥的类型。 Vs分别代表每个集合的值。