如何在EC2实例中合成ImageMagick中存储在S3上的大图像?

时间:2021-05-07 08:56:49

I have an ongoing list of image processing tasks to do, using ImageMagick to composite large individual graphic files (20MB each). These images are currently stored on S3 (approximately 2.5GB in total).

我有一个正在进行的图像处理任务列表,使用ImageMagick来组合大型单个图形文件(每个20MB)。这些图像当前存储在S3上(总共大约2.5GB)。

I was thinking to use multiple EC2 instances to process the tasks, composite the images and upload the output file to S3.

我正在考虑使用多个EC2实例来处理任务,合成图像并将输出文件上传到S3。

The problem with this setup is that ImageMagick needs the file library to be local (on the machine). Currently images are on S3, which means each instance would need to download a copy of the images from S3, slowing down the whole process.

这种设置的问题是ImageMagick需要文件库是本地的(在机器上)。目前图像在S3上,这意味着每个实例都需要从S3下载图像副本,从而减慢整个过程。

What's the best way to share this image library to all nodes?

将图像库共享到所有节点的最佳方法是什么?

1 个解决方案

#1


Consider also the following points:

还要考虑以下几点:

  1. You can do any processing of ImageMagick files completely in memory by "saving" any input image in the special format MPR: (Magick Pixel Register). For details see this answer: "ImageMagick multiple operations in single invocation"

    您可以通过以特殊格式MPR“保存”任何输入图像来完全在内存中处理ImageMagick文件:( Magick像素寄存器)。有关详细信息,请参阅此答案:“ImageMagick单次调用中的多个操作”

  2. ImageMagick can access remote images via http://.

    ImageMagick可以通过http://访问远程图像。

  3. You can put a lot of ImageMagick's operations into one single command line which also can produce multiple output files, and you can segment that command line into sub- or side-processes by using the parentheses syntax: ... \( IM side process \) ... for the sub-/side-processes.

    您可以将大量ImageMagick的操作放在一个命令行中,该命令行也可以生成多个输出文件,您可以使用括号语法将该命令行分段为子进程或侧进程:... \(IM侧进程\ )...用于子/侧过程。

How you can streamline your overall process depends a lot about what exactly you want to do. However,

如何简化整个流程取决于您想要做什么。然而,

  • the MPR: / MPC: technique can be very useful for this and probably avoid or minimize the need to use multiple EC2 instances;
  • MPR:/ MPC:技术对此非常有用,可能避免或最小化使用多个EC2实例的需要;

  • you cannot get around the step to somehow ship the input pixels to that instance of ImageMagick which should process them (so "downloading a copy" will always have to occur);
  • 你不能绕过这个步骤以某种方式将输入像素传送到应该处理它们的ImageMagick实例(因此总是必须“下载副本”);

  • you can minimize the number of downloads by storing the input under a series of MPR:xy1, MPR:xy2 etc. labels in memory and then access all these multiple times fast from a long and well constructed ImageMagick command line which does any number of compositions you want.
  • 您可以通过将输入存储在内存中的一系列MPR:xy1,MPR:xy2等标签中来最小化下载次数,然后从一个构造良好且构造良好的ImageMagick命令行快速访问所有这些,这些命令行可以执行任意数量的组合你要。


Example

To give an example. Consider having 10 TIFFs, and you want to create 3 different PDF files from these tiffs, each PDF containing a different set of pages made up from the 10 TIFFs. Normally you would run 3 commands:

举个例子。考虑有10个TIFF,并且您想要从这些tiff创建3个不同的PDF文件,每个PDF包含由10个TIFF组成的不同页面。通常你会运行3个命令:

convert 1.tif 3.tif 4.tif 8.tif 9.tif 10.tif -compress jpeg -quality 70 1out1.pdf
convert 2.tif 3.tif 4.tif 7.tif 8.tif  9.tif -compress jpeg -quality 70 1out2.pdf
convert 3.tif 4.tif 5.tif 7.tif 8.tif 10.tif -compress jpeg -quality 70 1out3.pdf

These 3 commands will have to load 6 TIFF files each (some TIFFs, like 3.tif being used in all 3 commands). That is 18 I/O events.

这3个命令必须每个加载6个TIFF文件(一些TIFF,如所有3个命令中使用的3.tif)。这是18个I / O事件。

Now consider this command alternative, which will run faster (I believe):

现在考虑一下这个命令的替代方案,它运行得更快(我相信):

convert                         \
  1.tif +write mpr:t1  +delete  \
  2.tif +write mpr:t2  +delete  \
  3.tif +write mpr:t3  +delete  \
  4.tif +write mpr:t4  +delete  \
  5.tif +write mpr:t5  +delete  \
  6.tif +write mpr:t6  +delete  \
  7.tif +write mpr:t7  +delete  \
  8.tif +write mpr:t8  +delete  \
  9.tif +write mpr:t9  +delete  \
 10.tif +write mpr:t10 +delete  \
  \( mpr:t1 mpr:t3 mpr:t4 mpr:t8 mpr:t9 mpr:t10                \
                -compress jpeg -quality 70 +write 2out1.pdf \) \
  \( mpr:t2 mpr:t3 mpr:t4 mpr:t7 mpr:t8 mpr:t9                 \
                -compress jpeg -quality 70 +write 2out2.pdf \) \
  \( mpr:t3 mpr:t4 mpr:t5 mpr:t7 mpr:t8 mpr:t10                \
                -compress jpeg -quality 70 +write 2out3.pdf \) \
  null:

This command loads each of the 10 TIFFs only once (10 I/O events in total). It then writes each TIFF into an MPR: file with an appropriate label and then deletes the initial TIFF from the image sequence.

此命令仅加载10个TIFF中的每一个(总共10个I / O事件)。然后它将每个TIFF写入具有适当标签的MPR:文件,然后从图像序列中删除初始TIFF。

After this initial preparation ImageMagick will run 3 different, parenthese-d side-processing pipelines in sequence loading the required output pages as MPR: images, and create a PDF from each of them.

在初始准备之后,ImageMagick将按顺序运行3个不同的括号-d侧处理管道,将所需的输出页面加载为MPR:images,并从每个输出页面创建一个PDF。

Above example is probably too limited in order to demonstrate a measurable advantage by using MPR:. Because the same results can also be achieved by this command:

以上示例可能过于有限,无法通过使用MPR来证明可衡量的优势:因为此命令也可以实现相同的结果:

convert  \
  1.tif  \
  2.tif  \
  3.tif  \
  4.tif  \
  5.tif  \
  6.tif  \
  7.tif  \
  8.tif  \
  9.tif  \
 10.tif  \
  \( -clone 0,2-3,7-9   -compress jpeg -quality 70 +write 3out1.pdf \) \
  \( -clone   1-3,6-8   -compress jpeg -quality 70 +write 3out2.pdf \) \
  \( -clone   2-4,6-7,9 -compress jpeg -quality 70 +write 3out3.pdf \) \
  null:

However, there is one more hook where some performance win may be acquired: the -compress jpeg -quality 70 is applied 3 times to 6 (cloned, original) images each.

然而,还有一个钩子可以获得一些性能获胜:-compress jpeg -quality 70每次应用3次到6个(克隆的,原始的)图像。

There may be some CPU cycles to be saved if we apply this operation to the TIFFs before they are written into the MPR registers. This way we apply that operation only to 10 TIFFs. Later we do not need to apply it any more when we write out the PDFs:

如果我们在将这些操作写入MPR寄存器之前将其应用于TIFF,则可能会保留一些CPU周期。这样我们只将该操作应用于10个TIFF。之后,当我们写出PDF时,我们不再需要再应用它了:

convert                         \
  -respect-parentheses          \
  1.tif  -compress jpeg -quality 70 +write mpr:t1  +delete  \
  2.tif  -compress jpeg -quality 70 +write mpr:t2  +delete  \
  3.tif  -compress jpeg -quality 70 +write mpr:t3  +delete  \
  4.tif  -compress jpeg -quality 70 +write mpr:t4  +delete  \
  5.tif  -compress jpeg -quality 70 +write mpr:t5  +delete  \
  6.tif  -compress jpeg -quality 70 +write mpr:t6  +delete  \
  7.tif  -compress jpeg -quality 70 +write mpr:t7  +delete  \
  8.tif  -compress jpeg -quality 70 +write mpr:t8  +delete  \
  9.tif  -compress jpeg -quality 70 +write mpr:t9  +delete  \
 10.tif  -compress jpeg -quality 70 +write mpr:t10 +delete  \
  \( mpr:t1 mpr:t3 mpr:t4 mpr:t8 mpr:t9 mpr:t10  4out1.pdf \) \
  \( mpr:t2 mpr:t3 mpr:t4 mpr:t7 mpr:t8 mpr:t9   4out2.pdf \) \
  \( mpr:t3 mpr:t4 mpr:t5 mpr:t7 mpr:t8 mpr:t10  4out3.pdf \) \
  null:

Update

Mark Setchell's comment was spot on. I had overlooked that before he mentioned it. It is probably faster (and certainly much less to type) to run the command like this:

Mark Setchell的评论很明显。在他提到它之前我忽略了它。运行命令可能更快(当然也更少打字):

convert                          \
  -respect-parentheses           \
  -compress jpeg -quality 70     \
  1.tif  +write mpr:t1  +delete  \
  2.tif  +write mpr:t2  +delete  \
  3.tif  +write mpr:t3  +delete  \
  4.tif  +write mpr:t4  +delete  \
  5.tif  +write mpr:t5  +delete  \
  6.tif  +write mpr:t6  +delete  \
  7.tif  +write mpr:t7  +delete  \
  8.tif  +write mpr:t8  +delete  \
  9.tif  +write mpr:t9  +delete  \
 10.tif  +write mpr:t10 +delete  \
  \( mpr:t1 mpr:t3 mpr:t4 mpr:t8 mpr:t9 mpr:t10  5out1.pdf \) \
  \( mpr:t2 mpr:t3 mpr:t4 mpr:t7 mpr:t8 mpr:t9   5out2.pdf \) \
  \( mpr:t3 mpr:t4 mpr:t5 mpr:t7 mpr:t8 mpr:t10  5out3.pdf \) \
  null:

You'll have to run your own benchmarks, with your own images, in your own environment, though, if you want to decide for whichever of the proposed commands you should prefer.

您必须在自己的环境中使用自己的图像运行自己的基准测试,但是,如果您想决定您应该选择的任何建议命令。

#1


Consider also the following points:

还要考虑以下几点:

  1. You can do any processing of ImageMagick files completely in memory by "saving" any input image in the special format MPR: (Magick Pixel Register). For details see this answer: "ImageMagick multiple operations in single invocation"

    您可以通过以特殊格式MPR“保存”任何输入图像来完全在内存中处理ImageMagick文件:( Magick像素寄存器)。有关详细信息,请参阅此答案:“ImageMagick单次调用中的多个操作”

  2. ImageMagick can access remote images via http://.

    ImageMagick可以通过http://访问远程图像。

  3. You can put a lot of ImageMagick's operations into one single command line which also can produce multiple output files, and you can segment that command line into sub- or side-processes by using the parentheses syntax: ... \( IM side process \) ... for the sub-/side-processes.

    您可以将大量ImageMagick的操作放在一个命令行中,该命令行也可以生成多个输出文件,您可以使用括号语法将该命令行分段为子进程或侧进程:... \(IM侧进程\ )...用于子/侧过程。

How you can streamline your overall process depends a lot about what exactly you want to do. However,

如何简化整个流程取决于您想要做什么。然而,

  • the MPR: / MPC: technique can be very useful for this and probably avoid or minimize the need to use multiple EC2 instances;
  • MPR:/ MPC:技术对此非常有用,可能避免或最小化使用多个EC2实例的需要;

  • you cannot get around the step to somehow ship the input pixels to that instance of ImageMagick which should process them (so "downloading a copy" will always have to occur);
  • 你不能绕过这个步骤以某种方式将输入像素传送到应该处理它们的ImageMagick实例(因此总是必须“下载副本”);

  • you can minimize the number of downloads by storing the input under a series of MPR:xy1, MPR:xy2 etc. labels in memory and then access all these multiple times fast from a long and well constructed ImageMagick command line which does any number of compositions you want.
  • 您可以通过将输入存储在内存中的一系列MPR:xy1,MPR:xy2等标签中来最小化下载次数,然后从一个构造良好且构造良好的ImageMagick命令行快速访问所有这些,这些命令行可以执行任意数量的组合你要。


Example

To give an example. Consider having 10 TIFFs, and you want to create 3 different PDF files from these tiffs, each PDF containing a different set of pages made up from the 10 TIFFs. Normally you would run 3 commands:

举个例子。考虑有10个TIFF,并且您想要从这些tiff创建3个不同的PDF文件,每个PDF包含由10个TIFF组成的不同页面。通常你会运行3个命令:

convert 1.tif 3.tif 4.tif 8.tif 9.tif 10.tif -compress jpeg -quality 70 1out1.pdf
convert 2.tif 3.tif 4.tif 7.tif 8.tif  9.tif -compress jpeg -quality 70 1out2.pdf
convert 3.tif 4.tif 5.tif 7.tif 8.tif 10.tif -compress jpeg -quality 70 1out3.pdf

These 3 commands will have to load 6 TIFF files each (some TIFFs, like 3.tif being used in all 3 commands). That is 18 I/O events.

这3个命令必须每个加载6个TIFF文件(一些TIFF,如所有3个命令中使用的3.tif)。这是18个I / O事件。

Now consider this command alternative, which will run faster (I believe):

现在考虑一下这个命令的替代方案,它运行得更快(我相信):

convert                         \
  1.tif +write mpr:t1  +delete  \
  2.tif +write mpr:t2  +delete  \
  3.tif +write mpr:t3  +delete  \
  4.tif +write mpr:t4  +delete  \
  5.tif +write mpr:t5  +delete  \
  6.tif +write mpr:t6  +delete  \
  7.tif +write mpr:t7  +delete  \
  8.tif +write mpr:t8  +delete  \
  9.tif +write mpr:t9  +delete  \
 10.tif +write mpr:t10 +delete  \
  \( mpr:t1 mpr:t3 mpr:t4 mpr:t8 mpr:t9 mpr:t10                \
                -compress jpeg -quality 70 +write 2out1.pdf \) \
  \( mpr:t2 mpr:t3 mpr:t4 mpr:t7 mpr:t8 mpr:t9                 \
                -compress jpeg -quality 70 +write 2out2.pdf \) \
  \( mpr:t3 mpr:t4 mpr:t5 mpr:t7 mpr:t8 mpr:t10                \
                -compress jpeg -quality 70 +write 2out3.pdf \) \
  null:

This command loads each of the 10 TIFFs only once (10 I/O events in total). It then writes each TIFF into an MPR: file with an appropriate label and then deletes the initial TIFF from the image sequence.

此命令仅加载10个TIFF中的每一个(总共10个I / O事件)。然后它将每个TIFF写入具有适当标签的MPR:文件,然后从图像序列中删除初始TIFF。

After this initial preparation ImageMagick will run 3 different, parenthese-d side-processing pipelines in sequence loading the required output pages as MPR: images, and create a PDF from each of them.

在初始准备之后,ImageMagick将按顺序运行3个不同的括号-d侧处理管道,将所需的输出页面加载为MPR:images,并从每个输出页面创建一个PDF。

Above example is probably too limited in order to demonstrate a measurable advantage by using MPR:. Because the same results can also be achieved by this command:

以上示例可能过于有限,无法通过使用MPR来证明可衡量的优势:因为此命令也可以实现相同的结果:

convert  \
  1.tif  \
  2.tif  \
  3.tif  \
  4.tif  \
  5.tif  \
  6.tif  \
  7.tif  \
  8.tif  \
  9.tif  \
 10.tif  \
  \( -clone 0,2-3,7-9   -compress jpeg -quality 70 +write 3out1.pdf \) \
  \( -clone   1-3,6-8   -compress jpeg -quality 70 +write 3out2.pdf \) \
  \( -clone   2-4,6-7,9 -compress jpeg -quality 70 +write 3out3.pdf \) \
  null:

However, there is one more hook where some performance win may be acquired: the -compress jpeg -quality 70 is applied 3 times to 6 (cloned, original) images each.

然而,还有一个钩子可以获得一些性能获胜:-compress jpeg -quality 70每次应用3次到6个(克隆的,原始的)图像。

There may be some CPU cycles to be saved if we apply this operation to the TIFFs before they are written into the MPR registers. This way we apply that operation only to 10 TIFFs. Later we do not need to apply it any more when we write out the PDFs:

如果我们在将这些操作写入MPR寄存器之前将其应用于TIFF,则可能会保留一些CPU周期。这样我们只将该操作应用于10个TIFF。之后,当我们写出PDF时,我们不再需要再应用它了:

convert                         \
  -respect-parentheses          \
  1.tif  -compress jpeg -quality 70 +write mpr:t1  +delete  \
  2.tif  -compress jpeg -quality 70 +write mpr:t2  +delete  \
  3.tif  -compress jpeg -quality 70 +write mpr:t3  +delete  \
  4.tif  -compress jpeg -quality 70 +write mpr:t4  +delete  \
  5.tif  -compress jpeg -quality 70 +write mpr:t5  +delete  \
  6.tif  -compress jpeg -quality 70 +write mpr:t6  +delete  \
  7.tif  -compress jpeg -quality 70 +write mpr:t7  +delete  \
  8.tif  -compress jpeg -quality 70 +write mpr:t8  +delete  \
  9.tif  -compress jpeg -quality 70 +write mpr:t9  +delete  \
 10.tif  -compress jpeg -quality 70 +write mpr:t10 +delete  \
  \( mpr:t1 mpr:t3 mpr:t4 mpr:t8 mpr:t9 mpr:t10  4out1.pdf \) \
  \( mpr:t2 mpr:t3 mpr:t4 mpr:t7 mpr:t8 mpr:t9   4out2.pdf \) \
  \( mpr:t3 mpr:t4 mpr:t5 mpr:t7 mpr:t8 mpr:t10  4out3.pdf \) \
  null:

Update

Mark Setchell's comment was spot on. I had overlooked that before he mentioned it. It is probably faster (and certainly much less to type) to run the command like this:

Mark Setchell的评论很明显。在他提到它之前我忽略了它。运行命令可能更快(当然也更少打字):

convert                          \
  -respect-parentheses           \
  -compress jpeg -quality 70     \
  1.tif  +write mpr:t1  +delete  \
  2.tif  +write mpr:t2  +delete  \
  3.tif  +write mpr:t3  +delete  \
  4.tif  +write mpr:t4  +delete  \
  5.tif  +write mpr:t5  +delete  \
  6.tif  +write mpr:t6  +delete  \
  7.tif  +write mpr:t7  +delete  \
  8.tif  +write mpr:t8  +delete  \
  9.tif  +write mpr:t9  +delete  \
 10.tif  +write mpr:t10 +delete  \
  \( mpr:t1 mpr:t3 mpr:t4 mpr:t8 mpr:t9 mpr:t10  5out1.pdf \) \
  \( mpr:t2 mpr:t3 mpr:t4 mpr:t7 mpr:t8 mpr:t9   5out2.pdf \) \
  \( mpr:t3 mpr:t4 mpr:t5 mpr:t7 mpr:t8 mpr:t10  5out3.pdf \) \
  null:

You'll have to run your own benchmarks, with your own images, in your own environment, though, if you want to decide for whichever of the proposed commands you should prefer.

您必须在自己的环境中使用自己的图像运行自己的基准测试,但是,如果您想决定您应该选择的任何建议命令。