Cloud Dataflow:如何使用Google提供的PubSub到BigQuery的模板

时间:2021-12-27 14:08:04

I am using PubSub to capture realtime data. Then using GCP Dataflow to stream the data into BigQuery. I am using Java for dataflow.

我正在使用PubSub来捕获实时数据。然后使用GCP Dataflow将数据流式传输到BigQuery。我正在使用Java进行数据流。

I want to try out the templates given in DataFlow. The process is: PubSub --> DataFlow --> BigQuery

我想试试DataFlow中给出的模板。过程是:PubSub - > DataFlow - > BigQuery

Currently I am sending message in string format into PubSub (Using Python here). But the template in dataflow is only accepting JSON message. The python library is not allowing me to publish a JSON message. Can anyone suggest me a way publish a JSON message to PubSub so that I can use the dataflow template to do the Job.

目前我正在以字符串格式向PubSub发送消息(在此使用Python)。但是数据流中的模板只接受JSON消息。 python库不允许我发布JSON消息。任何人都可以建议我向PubSub发布JSON消息,以便我可以使用数据流模板来完成Job。

1 个解决方案

#1


2  

The pipeline pumping data from PubSub to BQ provided by Google now assume JSON format and a matching schema on the other side.

从Google提供的PubSub到BQ的管道数据现在假设JSON格式和另一侧的匹配模式。

Publishing JSONs to Pubsub is no different from publishing strings. You can try the following code snippets for python dict to JSON conversion:

将JSON发布到Pubsub与发布字符串没有什么不同。您可以尝试以下用于python dict到JSON转换的代码片段:

import json
py_dict = {"name" : "Peter", "locale" : "en-US"}
json_string = json.dumps(py_dict)

If you'd like to do heavy customization to the pipeline, you can also take the source code at the following location and build your own.

如果您想对管道进行大量自定义,您还可以在以下位置获取源代码并构建自己的源代码。

https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/master/src/main/java/com/google/cloud/teleport/templates/PubSubToBigQuery.java

https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/master/src/main/java/com/google/cloud/teleport/templates/PubSubToBigQuery.java

#1


2  

The pipeline pumping data from PubSub to BQ provided by Google now assume JSON format and a matching schema on the other side.

从Google提供的PubSub到BQ的管道数据现在假设JSON格式和另一侧的匹配模式。

Publishing JSONs to Pubsub is no different from publishing strings. You can try the following code snippets for python dict to JSON conversion:

将JSON发布到Pubsub与发布字符串没有什么不同。您可以尝试以下用于python dict到JSON转换的代码片段:

import json
py_dict = {"name" : "Peter", "locale" : "en-US"}
json_string = json.dumps(py_dict)

If you'd like to do heavy customization to the pipeline, you can also take the source code at the following location and build your own.

如果您想对管道进行大量自定义,您还可以在以下位置获取源代码并构建自己的源代码。

https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/master/src/main/java/com/google/cloud/teleport/templates/PubSubToBigQuery.java

https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/master/src/main/java/com/google/cloud/teleport/templates/PubSubToBigQuery.java