张力流3通道的颜色输入顺序

时间:2023-02-04 18:52:45

I'm using tensor flow to process color images with a convolutional neural network. A code snippet is below.

我使用张量流来处理带有卷积神经网络的彩色图像。下面是一个代码片段。

My code runs so I think I got the number of channels right. My question is, how do I correctly order the rgb data? Is it in the form rgbrgbrgb or would it be rrrgggbbb? Presently I am using the latter. Thanks. Any help would be appreciated.

我的代码运行了,所以我认为我的通道数是正确的。我的问题是,如何正确地排序rgb数据?是rgbrgbrgb还是rrrgggbbb?目前我正在使用后者。谢谢。如有任何帮助,我们将不胜感激。

    c_output = 2
    c_input = 784 * 3

    def weight_variable(shape):
        initial = tf.truncated_normal(shape, stddev=0.1)
        return tf.Variable(initial)

    def bias_variable(shape):
        initial = tf.constant(0.1, shape=shape)
        return tf.Variable(initial)

    def conv2d(x, W):
        return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

    def max_pool_2x2(x):
        return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                              strides=[1, 2, 2, 1], padding='SAME')

    self.c_x = tf.placeholder(tf.float32, shape=[None, c_input])
    self.c_y_ = tf.placeholder(tf.float32, shape=[None, c_output])

    self.W_conv1 = weight_variable([5, 5, 3, 32])
    self.b_conv1 = bias_variable([32])
    self.x_image = tf.reshape(self.c_x, [-1, 28, 28  , 3])
    self.h_conv1 = tf.nn.relu(conv2d(self.x_image, self.W_conv1) + self.b_conv1)
    self.h_pool1 = max_pool_2x2(self.h_conv1)

    self.W_conv2 = weight_variable([5, 5, 32, 64])
    self.b_conv2 = bias_variable([64])

    self.h_conv2 = tf.nn.relu(conv2d(self.h_pool1, self.W_conv2) + self.b_conv2)
    self.h_pool2 = max_pool_2x2(self.h_conv2)

    self.W_fc1 = weight_variable([7 * 7 * 64, 1024])
    self.b_fc1 = bias_variable([1024])

    self.h_pool2_flat = tf.reshape(self.h_pool2, [-1, 7 * 7 * 64 ])
    self.h_fc1 = tf.nn.relu(tf.matmul(self.h_pool2_flat, self.W_fc1) + self.b_fc1)

    self.keep_prob = tf.placeholder(tf.float32)
    self.h_fc1_drop = tf.nn.dropout(self.h_fc1, self.keep_prob)

    self.W_fc2 = weight_variable([1024, c_output])
    self.b_fc2 = bias_variable([c_output])

    self.y_conv = tf.matmul(self.h_fc1_drop, self.W_fc2) + self.b_fc2

    self.c_cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(self.y_conv, self.c_y_))
    self.c_train_step = tf.train.AdamOptimizer(1e-4).minimize(self.c_cross_entropy)
    self.c_correct_prediction = tf.equal(tf.argmax(self.y_conv, 1), tf.argmax(self.c_y_, 1))
    self.c_accuracy = tf.reduce_mean(tf.cast(self.c_correct_prediction, tf.float32))

1 个解决方案

#1


3  

TL;DR: With your current program, the in-memory layout of the data should be should be R-G-B-R-G-B-R-G-B-R-G-B...

根据您当前的程序,内存中的数据布局应该是r - g - b - r - g - b - r - g - b - r - b - r - b - r - g - b…

I assume from this line that you are passing in RGB images with 28x28 pixels:

我假设从这一行,您正在传递具有28x28像素的RGB图像:

self.x_image = tf.reshape(self.c_x, [-1, 28, 28, 3])

We can call the dimensions of self.x_image are "batch", "height", "width", and "channel". This matches the default data format for tf.nn.conv_2d() and tf.nn.max_pool().

我们可以称之为自我的维度。x_image为“batch”、“height”、“width”和“channel”。它匹配tf.nn.conv_2d()和tf.nn.max_pool()的默认数据格式。

In TensorFlow, the in-memory representation of a tensor is row-major order (or "C" ordering, because that is the representation of arrays in the C programming language). Essentially this means that the rightmost dimension is the fastest changing, and the elements of the tensor are packed together in memory in the following order (where ? stands for the unknown batch size, minus 1):

在TensorFlow中,一个张量的内存表示是行-主要顺序(或“C”排序,因为这是C编程语言中数组的表示)。本质上,这意味着最右边的维数是变化最快的,张量的元素按以下顺序排列在内存中(在哪里?为未知批号,减1):

[0,  0,  0,  0]
[0,  0,  0,  1]
[0,  0,  0,  2]
[0,  0,  1,  0]
...
[?, 27, 27,  1]
[?, 27, 27,  2]

Therefore your program probably isn't interpreting the image data correctly. There are at least two options:

因此,您的程序可能没有正确地解释图像数据。至少有两种选择:

  1. Reshape your data to match its true order ("batch", "channels", "height", "width"):

    重新调整数据以匹配其真实的顺序(“批处理”、“通道”、“高度”、“宽度”):

    self.x_image = tf.reshape(self.c_x, [-1, 3, 28, 28])
    

    In fact, this format is sometimes more efficient for convolutions. You can instruct tf.nn.conv2d() and tf.nn.max_pool() to use it without transposing by passing the optional argument data_format="NCHW", but you will also need to change the shape of your bias variables to match.

    事实上,这种格式有时对卷积更有效。您可以通过传递可选参数data_format="NCHW"来指示tf.nn.conv2d()和tf.nn.max_pool()在不进行换位的情况下使用它,但您还需要更改偏差变量的形状以匹配。

  2. Transpose your image data to match the result of your program using tf.transpose():

    转置图像数据以匹配使用tf.转置()的程序结果:

    self.x_image = tf.transpose(tf.reshape(self.c_x, [-1, 3, 28, 28]), [0, 2, 3, 1])
    

#1


3  

TL;DR: With your current program, the in-memory layout of the data should be should be R-G-B-R-G-B-R-G-B-R-G-B...

根据您当前的程序,内存中的数据布局应该是r - g - b - r - g - b - r - g - b - r - b - r - b - r - g - b…

I assume from this line that you are passing in RGB images with 28x28 pixels:

我假设从这一行,您正在传递具有28x28像素的RGB图像:

self.x_image = tf.reshape(self.c_x, [-1, 28, 28, 3])

We can call the dimensions of self.x_image are "batch", "height", "width", and "channel". This matches the default data format for tf.nn.conv_2d() and tf.nn.max_pool().

我们可以称之为自我的维度。x_image为“batch”、“height”、“width”和“channel”。它匹配tf.nn.conv_2d()和tf.nn.max_pool()的默认数据格式。

In TensorFlow, the in-memory representation of a tensor is row-major order (or "C" ordering, because that is the representation of arrays in the C programming language). Essentially this means that the rightmost dimension is the fastest changing, and the elements of the tensor are packed together in memory in the following order (where ? stands for the unknown batch size, minus 1):

在TensorFlow中,一个张量的内存表示是行-主要顺序(或“C”排序,因为这是C编程语言中数组的表示)。本质上,这意味着最右边的维数是变化最快的,张量的元素按以下顺序排列在内存中(在哪里?为未知批号,减1):

[0,  0,  0,  0]
[0,  0,  0,  1]
[0,  0,  0,  2]
[0,  0,  1,  0]
...
[?, 27, 27,  1]
[?, 27, 27,  2]

Therefore your program probably isn't interpreting the image data correctly. There are at least two options:

因此,您的程序可能没有正确地解释图像数据。至少有两种选择:

  1. Reshape your data to match its true order ("batch", "channels", "height", "width"):

    重新调整数据以匹配其真实的顺序(“批处理”、“通道”、“高度”、“宽度”):

    self.x_image = tf.reshape(self.c_x, [-1, 3, 28, 28])
    

    In fact, this format is sometimes more efficient for convolutions. You can instruct tf.nn.conv2d() and tf.nn.max_pool() to use it without transposing by passing the optional argument data_format="NCHW", but you will also need to change the shape of your bias variables to match.

    事实上,这种格式有时对卷积更有效。您可以通过传递可选参数data_format="NCHW"来指示tf.nn.conv2d()和tf.nn.max_pool()在不进行换位的情况下使用它,但您还需要更改偏差变量的形状以匹配。

  2. Transpose your image data to match the result of your program using tf.transpose():

    转置图像数据以匹配使用tf.转置()的程序结果:

    self.x_image = tf.transpose(tf.reshape(self.c_x, [-1, 3, 28, 28]), [0, 2, 3, 1])