I am working on a embedded deep learning inference C++ project using tensorRT. For my model it is necessary to subtract the mean image.
我正在使用tensorRT开发嵌入式深度学习推理C ++项目。对于我的模型,有必要减去平均图像。
The api that I'm using allows me to define a mean image with the following data structure for rgb images:
我正在使用的api允许我为rgb图像定义具有以下数据结构的平均图像:
uint8_t *data[DW_MAX_IMAGE_PLANES]; // raw image data
size_t pitch; // pitch of the image in bytes
uint32_t height; // height of the image in px
uint32_t width; // image width in px
uint32_t planeCount; // plane count of the image
So far I found the lib LodePNG, which is quite usefull for this task I think. It can load pngs with just a few lines:
到目前为止,我找到了lib LodePNG,这对我认为的这项任务非常有用。它可以只用几行加载png:
// Load file and decode image.
std::vector<unsigned char> image;
unsigned width, height;
unsigned error = lodepng::decode(image, width, height, filename);
The question now is how to convert std::vector<unsigned char>
to uint8_t *[DW_MAX_IMAGE_PLANES]
and calculate the pitch and planeCount values?
现在的问题是如何将std :: vector
As I'm using rgb images DW_MAX_IMAGE_PLANES equals 3.
因为我正在使用rgb图像,DW_MAX_IMAGE_PLANES等于3。
1 个解决方案
#1
0
The values for pitch
and planeCount
are simple. Since LodePNG's decode
defaults to bitdepth = 8
, the value of pitch
, in bytes, is 1
. And because the image is RGB, the value of planeCount
is 3
--one plane for each color.
pitch和planeCount的值很简单。由于LodePNG的解码默认为bitdepth = 8,因此音调的值(以字节为单位)为1.由于图像为RGB,因此planeCount的值为3 - 每种颜色的一个平面。
Since you are not using the alpha channel, you should probably have LodePNG simply decode into RGB format directly:
由于您没有使用Alpha通道,您可能应该让LodePNG直接解码为RGB格式:
unsigned error = lodepng::decode(image, width, height, filename, LCT_RGB);
But once the image is decoded into the std::vector<unsigned char>
, you will not be able to use it directly. The decoded data from LodePNG is in the following format:
但是一旦将图像解码为std :: vector
image -> R0, G0, B0, R1, G1, B1, R2, G2, B2, ...
But you need it in the following format:
但是你需要它采用以下格式:
data[0] -> R0, R1, R2, ...
data[1] -> G0, G1, G2, ...
data[2] -> B0, B1, B2, ...
If you are memory constrained, you'll have to rearrange the values in the image vector (R0, R1, ... Rn, G0, G1, ... Gn, B0, B1, ... Bn)
and calculate the appropriate pointers to initialize the data
array.
如果您受内存限制,则必须重新排列图像矢量中的值(R0,R1,... Rn,G0,G1,... Gn,B0,B1,... Bn)并计算相应的值指针初始化数据数组。
If you have available memory, you can create separate vectors for each of the three color channels. Then copy the data from the decoded image
and initialize the data
array with pointers to the first element of the vectors.
如果有可用内存,则可以为三个颜色通道中的每一个创建单独的矢量。然后从解码图像中复制数据,并使用指向矢量第一个元素的指针初始化数据数组。
#1
0
The values for pitch
and planeCount
are simple. Since LodePNG's decode
defaults to bitdepth = 8
, the value of pitch
, in bytes, is 1
. And because the image is RGB, the value of planeCount
is 3
--one plane for each color.
pitch和planeCount的值很简单。由于LodePNG的解码默认为bitdepth = 8,因此音调的值(以字节为单位)为1.由于图像为RGB,因此planeCount的值为3 - 每种颜色的一个平面。
Since you are not using the alpha channel, you should probably have LodePNG simply decode into RGB format directly:
由于您没有使用Alpha通道,您可能应该让LodePNG直接解码为RGB格式:
unsigned error = lodepng::decode(image, width, height, filename, LCT_RGB);
But once the image is decoded into the std::vector<unsigned char>
, you will not be able to use it directly. The decoded data from LodePNG is in the following format:
但是一旦将图像解码为std :: vector
image -> R0, G0, B0, R1, G1, B1, R2, G2, B2, ...
But you need it in the following format:
但是你需要它采用以下格式:
data[0] -> R0, R1, R2, ...
data[1] -> G0, G1, G2, ...
data[2] -> B0, B1, B2, ...
If you are memory constrained, you'll have to rearrange the values in the image vector (R0, R1, ... Rn, G0, G1, ... Gn, B0, B1, ... Bn)
and calculate the appropriate pointers to initialize the data
array.
如果您受内存限制,则必须重新排列图像矢量中的值(R0,R1,... Rn,G0,G1,... Gn,B0,B1,... Bn)并计算相应的值指针初始化数据数组。
If you have available memory, you can create separate vectors for each of the three color channels. Then copy the data from the decoded image
and initialize the data
array with pointers to the first element of the vectors.
如果有可用内存,则可以为三个颜色通道中的每一个创建单独的矢量。然后从解码图像中复制数据,并使用指向矢量第一个元素的指针初始化数据数组。