如何使用Google Cloud Dataflow将压缩文件写入Google云端存储?

时间:2023-01-11 23:13:34

I am trying to write Gzipped files into Google Cloud Storage buckets in a Google Dataflow program. The FAQs say that

我正在尝试将Gzip文件写入Google Data Flow程序中的Google Cloud Storage存储桶。常见问题解答说

Does the TextIO source and sink support compressed files, such as GZip?
Yes. Cloud Dataflow can read files compressed with gzip and bzip2.

Does this mean that they don't support writing of GZip files?

这是否意味着他们不支持写GZip文件?

2 个解决方案

#1


3  

Correct, we currently don't have built-in support for writing gzip files. However, the user-defined data format API, in particular FileBasedSink, should make it straightforward to write a sink like that yourself.

正确,我们目前没有内置支持编写gzip文件。但是,用户定义的数据格式API,特别是FileBasedSink,应该可以直接编写类似自己的接收器。

#2


1  

In the new Beam libraries, you can now do this much more easily:

在新的Beam库中,您现在可以更轻松地执行此操作:

PDone d = c2.apply(TextIO.write()
  .to("gs://path")
  .withWritableByteChannelFactory(FileBasedSink.CompressionType.GZIP));

#1


3  

Correct, we currently don't have built-in support for writing gzip files. However, the user-defined data format API, in particular FileBasedSink, should make it straightforward to write a sink like that yourself.

正确,我们目前没有内置支持编写gzip文件。但是,用户定义的数据格式API,特别是FileBasedSink,应该可以直接编写类似自己的接收器。

#2


1  

In the new Beam libraries, you can now do this much more easily:

在新的Beam库中,您现在可以更轻松地执行此操作:

PDone d = c2.apply(TextIO.write()
  .to("gs://path")
  .withWritableByteChannelFactory(FileBasedSink.CompressionType.GZIP));