将Dataflow模板GCS调试为BigQuery

时间:2022-10-15 15:25:12

I am getting some strange errors that are difficult to debug. I am running a simple UDF JavaScript mapper which maps the JSON data and imports it into BigQuery. I've run other UDF functions previously and never encountered such errors.

我收到一些难以调试的奇怪错误。我正在运行一个简单的UDF JavaScript映射器,它映射JSON数据并将其导入BigQuery。我之前运行过其他UDF函数,从未遇到过这样的错误。

Is there any way to debug (with the actual debugger or at least with console.log or similar) the Dataflow templates UDF errors?

有没有办法调试(使用实际的调试器或至少使用console.log或类似的)数据流模板UDF错误?

The error in question: exception: "java.lang.RuntimeException: org.apache.beam.sdk.util.UserCodeException: java.lang.RuntimeException: java.lang.RuntimeException: org.json.JSONException: A JSONObject text must begin with '{' at 1 [character 2 line 1] at com.google.cloud.dataflow.worker.GroupAlsoByWindowsParDoFn$1.output(GroupAlsoByWindowsParDoFn.java:183) at com.google.cloud.dataflow.worker.GroupAlsoByWindowFnRunner$1.outputWindowedValue(GroupAlsoByWindowFnRunner.java:101) at com.google.cloud.dataflow.worker.util.BatchGroupAlsoByWindowReshuffleFn.processElement(BatchGroupAlsoByWindowReshuffleFn.java:54) at com.google.cloud.dataflow.worker.util.BatchGroupAlsoByWindowReshuffleFn.processElement(BatchGroupAlsoByWindowReshuffleFn.java:37) at com.google.cloud.dataflow.worker.GroupAlsoByWindowFnRunner.invokeProcessElement(GroupAlsoByWindowFnRunner.java:114) ...

例外:有问题的错误“了java.lang.RuntimeException:org.apache.beam.sdk.util.UserCodeException:了java.lang.RuntimeException:了java.lang.RuntimeException:org.json.JSONException:一个JSONObject文字必须以'{' 以1字符2线1]在com.google.cloud.dataflow.worker.GroupAlsoByWindowsParDoFn $ 1.输出(GroupAlsoByWindowsParDoFn.java:183)在com.google.cloud.dataflow.worker.GroupAlsoByWindowFnRunner $ 1.outputWindowedValue(GroupAlsoByWindowFnRunner .java:101)com.google.cloud.dataflow.worker.util.BatchGroupAlsoByWindowReshuffleFn.processElement(BatchGroupAlsoByWindowReshuffleFn.java:54)at com.google.cloud.dataflow.worker.util.BatchGroupAlsoByWindowReshuffleFn.processElement(BatchGroupAlsoByWindowReshuffleFn.java:37 )在com.google.cloud.dataflow.worker.GroupAlsoByWindowFnRunner.invokeProcessElement(GroupAlsoByWindowFnRunner.java:114)...

It's very difficult to say what this error is about: is this input data that is mis-formatted or output JSON from the UDF?

很难说出这个错误的含义:这个输入数据是否格式错误或从UDF输出JSON?

I've tried everything so far:

到目前为止我已经尝试了一切:

  • Unit tested the UDF locally with a sample data
  • 单元使用样本数据在本地测试UDF
  • Run the integration tests with the exact same file I try to analyse in the real environment
  • 使用我尝试在真实环境中分析的完全相同的文件运行集成测试
  • Used an empty JSON on the input (with empty object {})
  • 在输入上使用空JSON(使用空对象{})
  • Used a UDF function that returns an empty JSON object
  • 使用返回空JSON对象的UDF函数

Any tips on debugging Dataflow UDF Javascript would be highly appreciated.

有关调试Dataflow UDF Javascript的任何提示都将受到高度赞赏。

Is the source code of these Java classes available anywhere online?

这些Java类的源代码是否可以在线访问?

1 个解决方案

#1


1  

In this case the culprit turned out to be the BigQuery Schema, which needs to be wrapped into the JSON object:

在这种情况下,罪魁祸首证明是BigQuery Schema,需要将其包装到JSON对象中:

{
  "BigQuery Schema": [
    ... schema goes here
  ]
}

The following code could be useful for debugging: TextIOToBigQuery.java

以下代码可用于调试:TextIOToBigQuery.java

See the repo: https://github.com/GoogleCloudPlatform/DataflowTemplates

请参阅repo:https://github.com/GoogleCloudPlatform/DataflowTemplates

#1


1  

In this case the culprit turned out to be the BigQuery Schema, which needs to be wrapped into the JSON object:

在这种情况下,罪魁祸首证明是BigQuery Schema,需要将其包装到JSON对象中:

{
  "BigQuery Schema": [
    ... schema goes here
  ]
}

The following code could be useful for debugging: TextIOToBigQuery.java

以下代码可用于调试:TextIOToBigQuery.java

See the repo: https://github.com/GoogleCloudPlatform/DataflowTemplates

请参阅repo:https://github.com/GoogleCloudPlatform/DataflowTemplates