使用Google Cloud Dataflow合并Google云端存储中的文件

时间:2022-09-26 15:15:38

Nathan Marz in his book "Big Data" describes how to maintain files of data in HDFS and how to optimize files' sizes to be as near native HDFS block size as possible using his Pail library running on top of Map Reduce.

Nathan Marz在他的“大数据”一书中描述了如何使用他在Map Reduce上运行的Pail库来维护HDFS中的数据文件以及如何优化文件大小以尽可能接近原始HDFS块大小。

  1. Is it possible to achieve the same result in Google Cloud Storage?
  2. 是否有可能在Google云端存储中获得相同的结果?
  3. Can I use Google Cloud Dataflow instead of MapReduce for this purpose?
  4. 为此,我可以使用Google Cloud Dataflow而不是MapReduce吗?

1 个解决方案

#1


1  

Google Cloud Storage allows for composite objects, letting you store an object in multiple parts and combine them later up to a limit of 32 parts at once and 1024 constituent parts in total. This functionality is available in the API.

Google云端存储允许使用复合对象,允许您将对象存储在多个部分中,然后将它们组合在一起,最多可同时限制为32个部分,总共1024个组成部分。 API中提供了此功能。

Composite Objects and Parallel Uploads - Google Cloud Platform Developer's Guide

复合对象和并行上传 - Google Cloud Platform Developer's Guide

#1


1  

Google Cloud Storage allows for composite objects, letting you store an object in multiple parts and combine them later up to a limit of 32 parts at once and 1024 constituent parts in total. This functionality is available in the API.

Google云端存储允许使用复合对象,允许您将对象存储在多个部分中,然后将它们组合在一起,最多可同时限制为32个部分,总共1024个组成部分。 API中提供了此功能。

Composite Objects and Parallel Uploads - Google Cloud Platform Developer's Guide

复合对象和并行上传 - Google Cloud Platform Developer's Guide