
时间:2022-12-23 13:50:56

I want to know how to train a model in tensorflow if the cost cannot be evaluated at every input. E.g. if my objective function tests whether some condition is met half of the time (with any deviation from this being penalised).


Previously I would write code similar to the following to define my cost function and backpropagation learner:


# Backward propagation
loss = tensorflow.losses.mean_squared_error(labels=y, predictions=yhat)
cost = tensorflow.reduce_mean(loss, name='cost')
updates = tensorflow.train.GradientDescentOptimizer(0.01).minimize(cost)

Where yhat is a tensor producing some estimate of the output y, and cost is just the square of the difference between the true and predicted values.


However, what if my objective function could only be calculated once we have all inputs (or some batch of data), and the derivative wasn't known?


An example of this might be training a neural network to find a set of cartesian coordinates inside of some other function (e.g. inside the circle x^2 + y^2 = r^2 for various r) 50% of the time. The space of correct and incorrect answers is not finite, and while the derivative of the cost with respect to the output cannot be calculated (making backpropagation impossible) the loss function itself is relatively simple to calculate.

一个例子可能是训练一个神经网络寻找一组笛卡尔坐标系内的其他功能(如圆内x ^ 2 + y ^ 2 = r ^ 2各种r)50%的时间。正确和不正确答案的空间不是有限的,而对于输出的代价的导数不能计算(使反向传播不可能),损失函数本身是相对简单的计算。

def loss(yhat_all, inputs):
  for prediction, input in zip(yhat_all, inputs):
    correct += is_inside(prediction, input)

  return -abs(correct / len(inputs) - 0.5)

Obviously loss is not a valid tensor in this case, I just wrote it out in native python code to illustrate the problem. Given the above example, how would I define my updates tensor in this case? Obviously I can't use gradient descent, so I'll need to use a different optimiser, but I'm also at a loss how to even calculate the loss given that I can no longer use the normal losses tensors that run over each individual output in isolation.


1 个解决方案



First of all, what you can do is to define your own cost function over a whole batch instead of single inputs. Sticking with your circle example, you can do:


inside_bool = ( tf.square( X_pred ) + tf.square( Y_pred ) ) < tf.square( r )
inside_float = tf.cast( inside_bool, tf.float32 )
proportion_inside = tf.reduce_mean( inside_float )
loss = -tf.abs( proportion_inside - 0.5 )

Another question is what the input to such a network would be. I'd suggest you just start with a random tensor. (Basically, build a generative network.)


If your loss function is not derivable, it will be hard to train. So I'd suggest replace the non-derivable parts with derivable approximates. Most importantly, the inside-outside boolean could be a large root of the distance from the perimeter instead (maintaining sign.) Taking a large root approaches it to one. (Raising to power 0 would be the sign basically.) You can also add a regularizer that likes values around one and negative one. (This would ruin the distribution of your coordinates, however, if that's a factor.)

如果你的损失函数是不可推导的,那就很难训练了。所以我建议用可推导的近似代替不可导出的部分。最重要的是,内外布尔值可以是距离周长的一个大根值(保持符号)。取一个大根就可以得到1。(上升到0,基本上就是这个符号。)您还可以添加一个正则器,它喜欢1和- 1之间的值。(不过,如果这是一个因数,就会破坏坐标的分布。)

tf.abs() is not such a big problem, that's basically L1 regularization. So with all that, an idea could be (untested code):


dist_from_perimeter = ( tf.square( X_pred ) + tf.square( Y_pred ) ) - tf.square( r )
dist_loss = tf.sign( dist_from_perimeter ) * tf.pow( tf.abs( dist_from_perimeter ), 0.2 ) # 0.2 for 5th root
inside = tf.reduce_mean( dist_loss ) # 0-based now!
loss = -tf.abs( inside )

This would force all the points on the perimeter, but the gradient will grow really large around the perimeter, so it's not likely to be able to stay there. They will oscillate inside-outside, but once the proportion settles down, they won't move much. (Or so I think... :) )


If you have things other than a circle, then you have to come up with a reasonably easily calculable distance metric that would put close to equal pressure on both X and Y coordinates.


Hope all this helped!


Wrote working code for this, albeit didn't investigate the internals of the generated results:


import tensorflow as tf

r = 1.0

rnd = tf.random_uniform( shape = ( 100, 50 ), dtype = tf.float32, minval = 0.0, maxval = 1.0 )

l1 = tf.layers.dense( rnd, 50, activation = tf.nn.relu, kernel_regularizer = tf.nn.l2_loss )
l2 = tf.layers.dense( l1, 50, activation = tf.nn.relu, kernel_regularizer = tf.nn.l2_loss )
l3 = tf.layers.dense( l2, 50, activation = None, kernel_regularizer = tf.nn.l2_loss )
X_pred = tf.layers.dense( l3, 1, activation = None, kernel_regularizer = tf.nn.l2_loss )
Y_pred = tf.layers.dense( l3, 1, activation = None, kernel_regularizer = tf.nn.l2_loss )

dist_from_perimeter = ( tf.square( X_pred ) + tf.square( Y_pred ) ) - tf.square( r )
dist_loss = tf.sign( dist_from_perimeter ) * tf.pow( tf.abs( dist_from_perimeter ), 0.5 ) # 0.5 for square root
inside = tf.reduce_mean( dist_loss ) # 0-based now!
loss = tf.abs( inside )

inside_binary = tf.sign(tf.sign( dist_from_perimeter ) + 1 )
prop = tf.reduce_mean( inside_binary )

global_step = tf.Variable(0, name='global_step', trainable=False)
updates = tf.train.GradientDescentOptimizer( 0.0001 ).minimize( loss )

with tf.Session() as sess:
    init = tf.global_variables_initializer()
    for step in xrange( 100000 ):
        _, loss_value, prop_val = sess.run( [ updates, loss, prop ] )
        if 0 == step % 2000:
            print( "Step {}, loss {:.6f}, proportion inside: {:.4f}". format( step, loss_value, prop_val ) )



Step 0, loss 0.963431, proportion inside: 0.0000
Step 2000, loss 0.012302, proportion inside: 0.4900
Step 4000, loss 0.044224, proportion inside: 0.5300
Step 6000, loss 0.055603, proportion inside: 0.5400
Step 8000, loss 0.001739, proportion inside: 0.4100
Step 10000, loss 0.136604, proportion inside: 0.5900
Step 12000, loss 0.028738, proportion inside: 0.4600
Step 14000, loss 0.089664, proportion inside: 0.4100
Step 16000, loss 0.035139, proportion inside: 0.4900
Step 18000, loss 0.021432, proportion inside: 0.5100
Step 20000, loss 0.008821, proportion inside: 0.4600
Step 22000, loss 0.079573, proportion inside: 0.5500
Step 24000, loss 0.145942, proportion inside: 0.3700
Step 26000, loss 0.009984, proportion inside: 0.4700
Step 28000, loss 0.010401, proportion inside: 0.4700
Step 30000, loss 0.077145, proportion inside: 0.4000
Step 32000, loss 0.029588, proportion inside: 0.5300
Step 34000, loss 0.032815, proportion inside: 0.5100
Step 36000, loss 0.081417, proportion inside: 0.4000
Step 38000, loss 0.079384, proportion inside: 0.3900
Step 40000, loss 0.040977, proportion inside: 0.5500
Step 42000, loss 0.095768, proportion inside: 0.5900
Step 44000, loss 0.012109, proportion inside: 0.5300
Step 46000, loss 0.064089, proportion inside: 0.4200
Step 48000, loss 0.001401, proportion inside: 0.4700
Step 50000, loss 0.024378, proportion inside: 0.5400
Step 52000, loss 0.037057, proportion inside: 0.4900
Step 54000, loss 0.004553, proportion inside: 0.4800
Step 56000, loss 0.097677, proportion inside: 0.4000
Step 58000, loss 0.060175, proportion inside: 0.5300
Step 60000, loss 0.008686, proportion inside: 0.4800
Step 62000, loss 0.077828, proportion inside: 0.3600
Step 64000, loss 0.000750, proportion inside: 0.4600
Step 66000, loss 0.071392, proportion inside: 0.5700
Step 68000, loss 0.066447, proportion inside: 0.5600
Step 70000, loss 0.057511, proportion inside: 0.5600
Step 72000, loss 0.008800, proportion inside: 0.5400
Step 74000, loss 0.000322, proportion inside: 0.5200
Step 76000, loss 0.002286, proportion inside: 0.4700
Step 78000, loss 0.008778, proportion inside: 0.4900
Step 80000, loss 0.044092, proportion inside: 0.4500
Step 82000, loss 0.018876, proportion inside: 0.4600
Step 84000, loss 0.108120, proportion inside: 0.3500
Step 86000, loss 0.054647, proportion inside: 0.5600
Step 88000, loss 0.024990, proportion inside: 0.4600
Step 90000, loss 0.030924, proportion inside: 0.4700
Step 92000, loss 0.021789, proportion inside: 0.5100
Step 94000, loss 0.066370, proportion inside: 0.5600
Step 96000, loss 0.057060, proportion inside: 0.4100
Step 98000, loss 0.030641, proportion inside: 0.5200




First of all, what you can do is to define your own cost function over a whole batch instead of single inputs. Sticking with your circle example, you can do:


inside_bool = ( tf.square( X_pred ) + tf.square( Y_pred ) ) < tf.square( r )
inside_float = tf.cast( inside_bool, tf.float32 )
proportion_inside = tf.reduce_mean( inside_float )
loss = -tf.abs( proportion_inside - 0.5 )

Another question is what the input to such a network would be. I'd suggest you just start with a random tensor. (Basically, build a generative network.)


If your loss function is not derivable, it will be hard to train. So I'd suggest replace the non-derivable parts with derivable approximates. Most importantly, the inside-outside boolean could be a large root of the distance from the perimeter instead (maintaining sign.) Taking a large root approaches it to one. (Raising to power 0 would be the sign basically.) You can also add a regularizer that likes values around one and negative one. (This would ruin the distribution of your coordinates, however, if that's a factor.)

如果你的损失函数是不可推导的,那就很难训练了。所以我建议用可推导的近似代替不可导出的部分。最重要的是,内外布尔值可以是距离周长的一个大根值(保持符号)。取一个大根就可以得到1。(上升到0,基本上就是这个符号。)您还可以添加一个正则器,它喜欢1和- 1之间的值。(不过,如果这是一个因数,就会破坏坐标的分布。)

tf.abs() is not such a big problem, that's basically L1 regularization. So with all that, an idea could be (untested code):


dist_from_perimeter = ( tf.square( X_pred ) + tf.square( Y_pred ) ) - tf.square( r )
dist_loss = tf.sign( dist_from_perimeter ) * tf.pow( tf.abs( dist_from_perimeter ), 0.2 ) # 0.2 for 5th root
inside = tf.reduce_mean( dist_loss ) # 0-based now!
loss = -tf.abs( inside )

This would force all the points on the perimeter, but the gradient will grow really large around the perimeter, so it's not likely to be able to stay there. They will oscillate inside-outside, but once the proportion settles down, they won't move much. (Or so I think... :) )


If you have things other than a circle, then you have to come up with a reasonably easily calculable distance metric that would put close to equal pressure on both X and Y coordinates.


Hope all this helped!


Wrote working code for this, albeit didn't investigate the internals of the generated results:


import tensorflow as tf

r = 1.0

rnd = tf.random_uniform( shape = ( 100, 50 ), dtype = tf.float32, minval = 0.0, maxval = 1.0 )

l1 = tf.layers.dense( rnd, 50, activation = tf.nn.relu, kernel_regularizer = tf.nn.l2_loss )
l2 = tf.layers.dense( l1, 50, activation = tf.nn.relu, kernel_regularizer = tf.nn.l2_loss )
l3 = tf.layers.dense( l2, 50, activation = None, kernel_regularizer = tf.nn.l2_loss )
X_pred = tf.layers.dense( l3, 1, activation = None, kernel_regularizer = tf.nn.l2_loss )
Y_pred = tf.layers.dense( l3, 1, activation = None, kernel_regularizer = tf.nn.l2_loss )

dist_from_perimeter = ( tf.square( X_pred ) + tf.square( Y_pred ) ) - tf.square( r )
dist_loss = tf.sign( dist_from_perimeter ) * tf.pow( tf.abs( dist_from_perimeter ), 0.5 ) # 0.5 for square root
inside = tf.reduce_mean( dist_loss ) # 0-based now!
loss = tf.abs( inside )

inside_binary = tf.sign(tf.sign( dist_from_perimeter ) + 1 )
prop = tf.reduce_mean( inside_binary )

global_step = tf.Variable(0, name='global_step', trainable=False)
updates = tf.train.GradientDescentOptimizer( 0.0001 ).minimize( loss )

with tf.Session() as sess:
    init = tf.global_variables_initializer()
    for step in xrange( 100000 ):
        _, loss_value, prop_val = sess.run( [ updates, loss, prop ] )
        if 0 == step % 2000:
            print( "Step {}, loss {:.6f}, proportion inside: {:.4f}". format( step, loss_value, prop_val ) )



Step 0, loss 0.963431, proportion inside: 0.0000
Step 2000, loss 0.012302, proportion inside: 0.4900
Step 4000, loss 0.044224, proportion inside: 0.5300
Step 6000, loss 0.055603, proportion inside: 0.5400
Step 8000, loss 0.001739, proportion inside: 0.4100
Step 10000, loss 0.136604, proportion inside: 0.5900
Step 12000, loss 0.028738, proportion inside: 0.4600
Step 14000, loss 0.089664, proportion inside: 0.4100
Step 16000, loss 0.035139, proportion inside: 0.4900
Step 18000, loss 0.021432, proportion inside: 0.5100
Step 20000, loss 0.008821, proportion inside: 0.4600
Step 22000, loss 0.079573, proportion inside: 0.5500
Step 24000, loss 0.145942, proportion inside: 0.3700
Step 26000, loss 0.009984, proportion inside: 0.4700
Step 28000, loss 0.010401, proportion inside: 0.4700
Step 30000, loss 0.077145, proportion inside: 0.4000
Step 32000, loss 0.029588, proportion inside: 0.5300
Step 34000, loss 0.032815, proportion inside: 0.5100
Step 36000, loss 0.081417, proportion inside: 0.4000
Step 38000, loss 0.079384, proportion inside: 0.3900
Step 40000, loss 0.040977, proportion inside: 0.5500
Step 42000, loss 0.095768, proportion inside: 0.5900
Step 44000, loss 0.012109, proportion inside: 0.5300
Step 46000, loss 0.064089, proportion inside: 0.4200
Step 48000, loss 0.001401, proportion inside: 0.4700
Step 50000, loss 0.024378, proportion inside: 0.5400
Step 52000, loss 0.037057, proportion inside: 0.4900
Step 54000, loss 0.004553, proportion inside: 0.4800
Step 56000, loss 0.097677, proportion inside: 0.4000
Step 58000, loss 0.060175, proportion inside: 0.5300
Step 60000, loss 0.008686, proportion inside: 0.4800
Step 62000, loss 0.077828, proportion inside: 0.3600
Step 64000, loss 0.000750, proportion inside: 0.4600
Step 66000, loss 0.071392, proportion inside: 0.5700
Step 68000, loss 0.066447, proportion inside: 0.5600
Step 70000, loss 0.057511, proportion inside: 0.5600
Step 72000, loss 0.008800, proportion inside: 0.5400
Step 74000, loss 0.000322, proportion inside: 0.5200
Step 76000, loss 0.002286, proportion inside: 0.4700
Step 78000, loss 0.008778, proportion inside: 0.4900
Step 80000, loss 0.044092, proportion inside: 0.4500
Step 82000, loss 0.018876, proportion inside: 0.4600
Step 84000, loss 0.108120, proportion inside: 0.3500
Step 86000, loss 0.054647, proportion inside: 0.5600
Step 88000, loss 0.024990, proportion inside: 0.4600
Step 90000, loss 0.030924, proportion inside: 0.4700
Step 92000, loss 0.021789, proportion inside: 0.5100
Step 94000, loss 0.066370, proportion inside: 0.5600
Step 96000, loss 0.057060, proportion inside: 0.4100
Step 98000, loss 0.030641, proportion inside: 0.5200
