Google DataFlow python管道写入失败

时间:2022-02-17 15:18:32

I'm running a simple DataFlow pipeline w/ the Python SDK for counting keywords. The job runs fine for pre-processing the input data, but it fails for grouping/output steps with the following error.

我正在运行一个带有Python SDK的简单DataFlow管道来计算关键字。该作业可以正常运行以预处理输入数据,但是对于具有以下错误的分组/输出步骤失败。

I guess the logs says the worker is having an issue accessing the temp folder, but the storage bucket in our project exists with proper permissions. What could be a possible issue for this?

我想日志说工人在访问临时文件夹时遇到问题,但我们项目中的存储桶存在适当的权限。这可能是一个什么问题?

 "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcsio.py", line
 606, in write raise self.upload_thread.last_error # pylint:
 disable=raising-bad-type HttpError: HttpError accessing
 <https://www.googleapis.com/resumable/upload/storage/v1/b/[PROJECT-NAME-REDACTED]-temp-2016-08-07_04-42-52/o?uploadType=resumable&alt=json&name=0015bf8d-fa87-4c9a-82d6-8ffcd742d770>:
 response: <{'status': '404', 'alternate-protocol': '443:quic',
 'content-length': '165', 'vary': 'Origin, X-Origin', 'server':
 'UploadServer', 'x-guploader-uploadid':
 'AEnB2UoYRPUwhz-OXlJ437k0J8Uxd1lJvTsFbfVJF_YMP2GQEvmdDpo7e-3DVhuqNd9b1A_RFPbfIcK6hCsFcar-hdI94rqJZUvATcDmGRRIvHecAt5CTrg',
 'date': 'Sun, 07 Aug 2016 04:43:23 GMT', 'alt-svc': 'quic=":443";
 ma=2592000; v="36,35,34,33,32,31,30"', 'content-type':
 'application/json; charset=UTF-8'}>, content <{ "error": { "errors": [
 { "domain": "global", "reason": "notFound", "message": "Not Found" }
 ], "code": 404, "message": "Not Found" } } >

1 个解决方案

#1


0  

This is https://issues.apache.org/jira/browse/BEAM-539, which doesn't allow root buckets as outputs for TextFileSink. As a workaround, please use a subdirectory path (e.g. gs://foo/bar) as output locations.

这是https://issues.apache.org/jira/browse/BEAM-539,它不允许根存储桶作为TextFileSink的输出。作为解决方法,请使用子目录路径(例如gs:// foo / bar)作为输出位置。

#1


0  

This is https://issues.apache.org/jira/browse/BEAM-539, which doesn't allow root buckets as outputs for TextFileSink. As a workaround, please use a subdirectory path (e.g. gs://foo/bar) as output locations.

这是https://issues.apache.org/jira/browse/BEAM-539,它不允许根存储桶作为TextFileSink的输出。作为解决方法,请使用子目录路径(例如gs:// foo / bar)作为输出位置。