在pdf中查找图像的坐标以将其替换为另一个

时间:2022-01-09 17:06:32

I have a pdf which I would like to use as a template to create a new pdf. The goal is to place an image inside a particular placeholder rectangle in the original pdf. The creation of the original pdf is under my control but the placeholder rectangle/bounds might be anywhere in the pdf. I am thinking of using a dummy image(of same dimensions) as the placeholder rectangle in the original pdf.

我有一个pdf,我想用它作为模板来创建一个新的pdf。目标是将图像放在原始pdf中的特定占位符矩形内。原始pdf的创建在我的控制之下,但占位符矩形/边界可能在pdf中的任何位置。我正在考虑使用虚拟图像(相同尺寸)作为原始pdf中的占位符矩形。

The Prawn gem supports placing an image at a given absolute/relative position within a page.

Prawn gem支持将图像放置在页面内的给定绝对/相对位置。

The trouble is that since the rectangle or dummy-image could be anywhere in the original pdf, I don't know what values to use for the following

问题是因为矩形或虚拟图像可能在原始pdf中的任何位置,我不知道用于以下内容的值

pdf.image "/path/to/image", :at => [x,y] prawn call

pdf.image“/ path / to / image”,:at => [x,y] prawn call

Is there a way to get the coordinates of an image in the original pdf. My primitive understanding tells me that one would have to render the entire pdf to know this. Is that right ? If yes, what would be a good way to render pdf in memory (headless) and get the co-ordinates of various pdf objects(like bounding rectangles, images, etc).

有没有办法在原始pdf中获取图像的坐标。我的原始理解告诉我,必须渲染整个pdf以了解这一点。是对的吗 ?如果是的话,什么是在内存中渲染pdf(无头)并获得各种pdf对象(如边界矩形,图像等)的坐标的好方法。

I am not limited by language/runtime here as long as I can trigger it programmatically.

只要我能以编程方式触发它,我就不受语言/运行时的限制。

What could be other approaches to this problem ?

可能是解决这个问题的其他方法是什么?

1 个解决方案

#1


1  

Not an answer (e.g. I don't know the Ruby language), but in lieu of any others, and because I can't post a comment yet, here's what I think.

不是答案(例如我不懂Ruby语言),而是代替其他任何人,而且因为我还不能发表评论,这就是我的想法。

If conditions stated above are true (placeholder and replacement images are exactly same size + same color model e.g. RGB 24 bps) and you control template creation (therefore you can store placeholder inside PDF uncompressed), it can be as quick and dirty as raw replacement in a file treated as byte string. E.g. placeholder filled with red, then you search for pattern (0xFF0000) x W*H and replace it with raw image data. Which, of course, you can get any way you like, e.g.:

如果上述条件都满足(占位和更换图像是完全一样的大小+相同的颜色模型如RGB 24个基点),并可以控制模板创建(因此你可以存储内部PDF未压缩的占位符),它可以作为快速和肮脏的原料替代在作为字节字符串处理的文件中。例如。占位符填充红色,然后搜索模式(0xFF0000)x W * H并将其替换为原始图像数据。当然,你可以通过任何方式获得,例如:

convert my_image.jpg RGB:- | ...

转换my_image.jpg RGB: - | ...

If this solution is too dirty or conditions not exact, then parse page content stream for construct like

如果此解决方案太脏或条件不准确,则解析页面内容流以构造类似

width 0 0 height x y cm
/name Do

宽度0 0高度x y cm /名称

It's not cleanest, either, but for vast number of simple page descriptions x and y are the coordinates you are looking for.

它也不是最干净的,但是对于大量简单的页面描述,x和y是你正在寻找的坐标。

Further, if you control template creation, why don't you store additional information inside pdf as e.g. custom keys in Info dictionary, and then read them back when using the template.

此外,如果您控制模板创建,为什么不在pdf中存储其他信息,例如: Info字典中的自定义键,然后在使用模板时将其读回。

#1


1  

Not an answer (e.g. I don't know the Ruby language), but in lieu of any others, and because I can't post a comment yet, here's what I think.

不是答案(例如我不懂Ruby语言),而是代替其他任何人,而且因为我还不能发表评论,这就是我的想法。

If conditions stated above are true (placeholder and replacement images are exactly same size + same color model e.g. RGB 24 bps) and you control template creation (therefore you can store placeholder inside PDF uncompressed), it can be as quick and dirty as raw replacement in a file treated as byte string. E.g. placeholder filled with red, then you search for pattern (0xFF0000) x W*H and replace it with raw image data. Which, of course, you can get any way you like, e.g.:

如果上述条件都满足(占位和更换图像是完全一样的大小+相同的颜色模型如RGB 24个基点),并可以控制模板创建(因此你可以存储内部PDF未压缩的占位符),它可以作为快速和肮脏的原料替代在作为字节字符串处理的文件中。例如。占位符填充红色,然后搜索模式(0xFF0000)x W * H并将其替换为原始图像数据。当然,你可以通过任何方式获得,例如:

convert my_image.jpg RGB:- | ...

转换my_image.jpg RGB: - | ...

If this solution is too dirty or conditions not exact, then parse page content stream for construct like

如果此解决方案太脏或条件不准确,则解析页面内容流以构造类似

width 0 0 height x y cm
/name Do

宽度0 0高度x y cm /名称

It's not cleanest, either, but for vast number of simple page descriptions x and y are the coordinates you are looking for.

它也不是最干净的,但是对于大量简单的页面描述,x和y是你正在寻找的坐标。

Further, if you control template creation, why don't you store additional information inside pdf as e.g. custom keys in Info dictionary, and then read them back when using the template.

此外,如果您控制模板创建,为什么不在pdf中存储其他信息,例如: Info字典中的自定义键,然后在使用模板时将其读回。