是否可以使用内置BigQueryIO.Write的Dataflow中的BigQuery load config使用setSchemaUpdateOptions(ALLOW_FIELD_ADDITION)?

时间:2022-06-30 14:51:59

I would like to use the experimental option that allows me to update a BigQuery schema when performing a load job.

我想使用允许我在执行加载作业时更新BigQuery架构的实验选项。

I'm using Dataflow and the built-in BigQueryIO.write from the SDK.

我正在使用Dataflow和SDK中的内置BigQueryIO.write。

I saw that with a JobConfigurationLoad.setSchemaUpdateOptions(ALLOW_FIELD_ADDITION) from the BigQuery API it's possible, but I can't find the equivalent with the BigQueryIO.

我从BigQuery API看到了一个JobConfigurationLoad.setSchemaUpdateOptions(ALLOW_FIELD_ADDITION),但是我找不到与BigQueryIO等效的东西。

Does it exist somewhere or can I override some part in the BigQueryIO to do that ?

它存在于某个地方还是可以覆盖BigQueryIO中的某些部分来执行此操作?

Thank you very much,

非常感谢你,

1 个解决方案

#1


0  

AFAIK, that experimental option is not yet exposed via the Dataflow/Beam APIs in BigQueryIO, and it would not be a trivial task to override something in in that class - I wouldn't recommended going down that route.

AFAIK,该实验选项尚未通过BigQueryIO中的Dataflow / Beam API公开,并且在该类中覆盖某些内容并不是一项微不足道的任务 - 我不建议沿着那条路走下去。

One workaround I can think of would be to redirect your sink to GCS instead of BigQuery, and then perform a normal BigQuery load job(s) at the end of your pipeline. That way you can use the option.

我能想到的一个解决方法是将接收器重定向到GCS而不是BigQuery,然后在管道末端执行正常的BigQuery加载作业。这样你就可以使用该选项。

#1


0  

AFAIK, that experimental option is not yet exposed via the Dataflow/Beam APIs in BigQueryIO, and it would not be a trivial task to override something in in that class - I wouldn't recommended going down that route.

AFAIK,该实验选项尚未通过BigQueryIO中的Dataflow / Beam API公开,并且在该类中覆盖某些内容并不是一项微不足道的任务 - 我不建议沿着那条路走下去。

One workaround I can think of would be to redirect your sink to GCS instead of BigQuery, and then perform a normal BigQuery load job(s) at the end of your pipeline. That way you can use the option.

我能想到的一个解决方法是将接收器重定向到GCS而不是BigQuery,然后在管道末端执行正常的BigQuery加载作业。这样你就可以使用该选项。