数据流运行器 - 由于401而刷新

时间:2022-06-30 14:52:23

Running a pipeline on DataflowRunner (Google Cloud Dataflow SDK for Python 0.5.5).

在DataflowRunner上运行管道(适用于Python 0.5.5的Google Cloud Dataflow SDK)。

The pipeline:

管道:

(p
    | 'Read trip from BigQuery' >> beam.io.Read(beam.io.BigQuerySource(query=known_args.input))
    | 'Convert' >> beam.Map(lambda row: (row['HardwareId'],row))
    | 'Group devices' >> beam.GroupByKey()
    | 'Pull way info from mapserver' >> beam.FlatMap(get_osm_way)
    | 'Map way info to dictionary' >> beam.FlatMap(convert_to_dict)
    | 'Save to BQ' >> beam.io.Write(beam.io.BigQuerySink(
            known_args.output,            schema=schema_string,
            create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
            write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE))
  )

It's set to be autoscaling and 15 workers were spinned up by the runner.

它将被设置为自动缩放,并由跑步者调整了15名工作人员。

More detailed code: my another * question

更详细的代码:我的另一个*问题

After around 2 hours of running, it reported:

运行约2小时后,它报告:

19:41:19.908
Attempting refresh to obtain initial access_token
 {
 insertId: "jf9yr4g1sv0qku"   
 jsonPayload: {
  message: "Attempting refresh to obtain initial access_token"    
  worker: "beamapp-root-0216221014-5-02161410-29cb-harness-xqx2"    
  logger: "oauth2client.client:client.py:new_request"    
  thread: "110:140052132222720"    
  job: "2017-02-16_14_10_18-17481182243152998182"    
 }
 resource: {…}   
 timestamp: "2017-02-17T00:41:19.908143997Z"   
 severity: "INFO"   
 labels: {…}   
 logName: "projects/fiona-zhao/logs/dataflow.googleapis.com%2Fworker"   
}

and started continuously reporting "refreshing due to a 401" . One of them is:

并开始不断报告“因401而令人耳目一新”。其中之一是:

21:45:12.886
Refreshing due to a 401 (attempt 1/2)
 {
 insertId: "zsorfgg1urhvty"   
 jsonPayload: {
  worker: "beamapp-root-0216221014-5-02161410-29cb-harness-xqx2"    
  logger: "oauth2client.client:client.py:new_request"    
  thread: "110:140052273633024"    
  job: "2017-02-16_14_10_18-17481182243152998182"    
  message: "Refreshing due to a 401 (attempt 1/2)"    
 }
 resource: {…}  
 timestamp: "2017-02-17T02:45:12.886137962Z"   
 severity: "INFO"   
 labels: {
  compute.googleapis.com/resource_name: "dataflow-beamapp-root-0216221014-5-02161410-29cb-harness-xqx2"    
  dataflow.googleapis.com/job_id: "2017-02-16_14_10_18-17481182243152998182"    
  dataflow.googleapis.com/job_name: "beamapp-root-0216221014-530646"    
  dataflow.googleapis.com/region: "global"    
  compute.googleapis.com/resource_type: "instance"    
  compute.googleapis.com/resource_id: "2301951363070532306"    
 }
 logName: "projects/fiona-zhao/logs/dataflow.googleapis.com%2Fworker"   
}

What can I do?

我能做什么?

1 个解决方案

#1


1  

These log messages are a normal part of execution and in themselves do not reflect errors. My suggestion is to add additional logging to debug hanging external API calls or execution steps.

这些日志消息是执行的正常部分,本身并不反映错误。我的建议是添加额外的日志记录来调试挂起的外部API调用或执行步骤。

Though we cannot comment on specific execution details of particular jobs on this open forum, the Cloud Dataflow team can provide more support on the dataflow-feedback@google.com mailing list.

虽然我们无法在此开放论坛上评论特定作业的具体执行细节,但Cloud Dataflow团队可以在dataflow-feedback@google.com邮件列表上提供更多支持。

#1


1  

These log messages are a normal part of execution and in themselves do not reflect errors. My suggestion is to add additional logging to debug hanging external API calls or execution steps.

这些日志消息是执行的正常部分,本身并不反映错误。我的建议是添加额外的日志记录来调试挂起的外部API调用或执行步骤。

Though we cannot comment on specific execution details of particular jobs on this open forum, the Cloud Dataflow team can provide more support on the dataflow-feedback@google.com mailing list.

虽然我们无法在此开放论坛上评论特定作业的具体执行细节,但Cloud Dataflow团队可以在dataflow-feedback@google.com邮件列表上提供更多支持。