Tensorflow:使用在一个模型中训练的权重在另一个模型中,不同的模型

时间:2023-02-03 10:11:59

I'm trying to train an LSTM in Tensorflow using minibatches, but after training is complete I would like to use the model by submitting one example at a time to it. I can set up the graph within Tensorflow to train my LSTM network, but I can't use the trained result afterward in the way I want.

我正在尝试使用miniatches在Tensorflow中训练LSTM,但是在训练完成后我想通过一次提交一个示例来使用该模型。我可以在Tensorflow中设置图形来训练我的LSTM网络,但是我不能以我想要的方式使用训练后的结果。

The setup code looks something like this:

设置代码如下所示:

#Build the LSTM model.
cellRaw = rnn_cell.BasicLSTMCell(LAYER_SIZE)
cellRaw = rnn_cell.MultiRNNCell([cellRaw] * NUM_LAYERS)

cell = rnn_cell.DropoutWrapper(cellRaw, output_keep_prob = 0.25)

input_data  = tf.placeholder(dtype=tf.float32, shape=[SEQ_LENGTH, None, 3])
target_data = tf.placeholder(dtype=tf.float32, shape=[SEQ_LENGTH, None])
initial_state = cell.zero_state(batch_size=BATCH_SIZE, dtype=tf.float32)

with tf.variable_scope('rnnlm'):
    output_w = tf.get_variable("output_w", [LAYER_SIZE, 6])
    output_b = tf.get_variable("output_b", [6])

outputs, final_state = seq2seq.rnn_decoder(input_list, initial_state, cell, loop_function=None, scope='rnnlm')
output = tf.reshape(tf.concat(1, outputs), [-1, LAYER_SIZE])
output = tf.nn.xw_plus_b(output, output_w, output_b)

...Note the two placeholders, input_data and target_data. I haven't bothered including the optimizer setup. After training is complete and the training session closed, I would like to set up a new session that uses the trained LSTM network whose input is provided by a completely different placeholder, something like:

...注意两个占位符input_data和target_data。我没有打扰包括优化器设置。培训结束并且培训课程结束后,我想建立一个使用经过培训的LSTM网络的新会话,其输入由完全不同的占位符提供,例如:

with tf.Session() as sess:
with tf.variable_scope("simulation", reuse=None):
    cellSim = cellRaw
    input_data_sim  = tf.placeholder(dtype=tf.float32, shape=[1, 1, 3])
    initial_state_sim = cell.zero_state(batch_size=1, dtype=tf.float32)
    input_list_sim = tf.unpack(input_data_sim)

    outputsSim, final_state_sim = seq2seq.rnn_decoder(input_list_sim, initial_state_sim, cellSim, loop_function=None, scope='rnnlm')
    outputSim = tf.reshape(tf.concat(1, outputsSim), [-1, LAYER_SIZE])

    with tf.variable_scope('rnnlm'):
        output_w = tf.get_variable("output_w", [LAYER_SIZE, nOut])
        output_b = tf.get_variable("output_b", [nOut])

    outputSim = tf.nn.xw_plus_b(outputSim, output_w, output_b)

This second part returns the following error:

第二部分返回以下错误:

tensorflow.python.framework.errors.InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder' with dtype float
 [[Node: Placeholder = Placeholder[dtype=DT_FLOAT, shape=[], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

...Presumably because the graph I'm using still has the old training placeholders attached to the trained LSTM nodes. What's the right way to 'extract' the trained LSTM and put it into a new, different graph that has a different style of inputs? The Varible scoping features that Tensorflow has seem to address something like this, but the examples in the documentation all talk about using variable scope as a way of managing variable names so that the same piece of code will generate similar subgraphs within the same graph. The 'reuse' feature seems to be close to what I want, but I don't find the Tensorflow documentation linked above to be clear at all on what it does. The cells themselves cannot be given a name (in other words,

...大概是因为我正在使用的图表仍然将旧的训练占位符附加到训练有素的LSTM节点上。什么是“提取”经过训练的LSTM并将其放入具有不同输入风格的新的不同图形的正确方法? Tensorflow似乎解决了类似这样的变量范围特性,但文档中的示例都讨论了使用变量范围作为管理变量名称的方法,以便同一段代码将在同一图形中生成类似的子图。 “重用”功能似乎接近我想要的,但我发现上面链接的Tensorflow文档根本不清楚它的作用。细胞本身不能给出一个名字(换句话说,

cellRaw = rnn_cell.MultiRNNCell([cellRaw] * NUM_LAYERS, name="multicell")

is not valid), and while I can give a name to a seq2seq.rnn_decoder(), I presumably wouldn't be able to remove the rnn_cell.DropoutWrapper() if I used that node unchanged.

是无效的,虽然我可以给seq2seq.rnn_decoder()命名,但如果我使用该节点不变,我可能无法删除rnn_cell.DropoutWrapper()。

Questions:

问题:

What is the proper way to move trained LSTM weights from one graph to another?

将经过训练的LSTM权重从一个图移动到另一个图的正确方法是什么?

Is it correct to say that starting a new session "releases resources", but doesn't erase the graph built in memory?

说开始新会话“释放资源”,但不删除内存中的图形是否正确?

It seems to me like the 'reuse' feature allows Tensorflow to search outside of the current variable scope for variables with the same name (existing in a different scope), and use them in the current scope. Is this correct? If it is, what happens to all of the graph edges from the non-current scope that link to that variable? If it isn't, why does Tensorflow throw an error if you try to have the same variable name within two different scopes? It seems perfectly reasonable to define two variables with identical names in two different scopes, e.g. conv1/sum1 and conv2/sum1.

在我看来,“重用”功能允许Tensorflow在当前变量范围之外搜索具有相同名称(存在于不同范围内)的变量,并在当前范围中使用它们。它是否正确?如果是,那么链接到该变量的非当前范围的所有图形边缘会发生什么?如果不是,如果您尝试在两个不同的范围内使用相同的变量名称,为什么Tensorflow会抛出错误?在两个不同的范围中定义两个具有相同名称的变量似乎是完全合理的,例如conv1 / sum1和conv2 / sum1。

In my code I'm working within a new scope but the graph won't run without data to be fed into a placeholder from the initial, default scope. Is the default scope always 'in-scope' for some reason?

在我的代码中,我正在一个新的范围内工作,但是如果没有数据从初始的默认范围输入占位符,图形将无法运行。由于某种原因,默认范围是否始终为“范围内”?

If graph edges can span different scopes, and names in different scopes can't be shared unless they refer to the exact same node, then that would seem to defeat the purpose of having different scopes in the first place. What am I misunderstanding here?

如果图形边缘可以跨越不同的范围,并且除非它们引用完全相同的节点,否则不能共享不同范围中的名称,那么这似乎首先会破坏具有不同范围的目的。我在这里误解了什么?

Thanks!

谢谢!

1 个解决方案

#1


2  

What is the proper way to move trained LSTM weights from one graph to another?

将经过训练的LSTM权重从一个图移动到另一个图的正确方法是什么?

You can create your decoding graph first (with a saver object to save the parameters) and create a GraphDef object that you can import in your bigger training graph:

您可以先创建解码图(使用保存对象保存参数)并创建一个GraphDef对象,您可以在更大的训练图中导入该对象:

basegraph = tf.Graph()
with basegraph.as_default():
   ***your graph***

traingraph = tf.Graph()
with traingraph.as_default():
     tf.import_graph_def(basegraph.as_graph_def())
     ***your training graph***

make sure you load your variables when you start a session for a new graph.

确保在启动新图表的会话时加载变量。

I don't have experience with this functionality so you may have to look into it a bit more

我没有使用此功能的经验,因此您可能需要更多地研究它

Is it correct to say that starting a new session "releases resources", but doesn't erase the graph built in memory?

说开始新会话“释放资源”,但不删除内存中的图形是否正确?

yep, the graph object still hold it

是的,图形对象仍然持有它

It seems to me like the 'reuse' feature allows Tensorflow to search outside of the current variable scope for variables with the same name (existing in a different scope), and use them in the current scope. Is this correct? If it is, what happens to all of the graph edges from the non-current scope that link to that variable? If it isn't, why does Tensorflow throw an error if you try to have the same variable name within two different scopes? It seems perfectly reasonable to define two variables with identical names in two different scopes, e.g. conv1/sum1 and conv2/sum1.

在我看来,“重用”功能允许Tensorflow在当前变量范围之外搜索具有相同名称(存在于不同范围内)的变量,并在当前范围中使用它们。它是否正确?如果是,那么链接到该变量的非当前范围的所有图形边缘会发生什么?如果不是,如果您尝试在两个不同的范围内使用相同的变量名称,为什么Tensorflow会抛出错误?在两个不同的范围中定义两个具有相同名称的变量似乎是完全合理的,例如conv1 / sum1和conv2 / sum1。

No, reuse is to determine the behaviour when you use get_variable on an existing name, when it is true it will return the existing variable, otherwise it will return a new one. Normally tensorflow should not throw an error. Are you sure your using tf.get_variable and not just tf.Variable?

不,重用是确定在现有名称上使用get_variable时的行为,当它为true时它将返回现有变量,否则它将返回一个新变量。通常,tensorflow不应该抛出错误。你确定你使用tf.get_variable而不仅仅是tf.Variable吗?

In my code I'm working within a new scope but the graph won't run without data to be fed into a placeholder from the initial, default scope. Is the default scope always 'in-scope' for some reason?

在我的代码中,我正在一个新的范围内工作,但是如果没有数据从初始的默认范围输入占位符,图形将无法运行。由于某种原因,默认范围是否始终为“范围内”?

I don't really see what you mean. The do not always have to be used. If a placeholder is not required for running an operation you don't have to define it.

我真的不明白你的意思。并不总是必须使用。如果运行操作不需要占位符,则不必定义它。

If graph edges can span different scopes, and names in different scopes can't be shared unless they refer to the exact same node, then that would seem to defeat the purpose of having different scopes in the first place. What am I misunderstanding here?

如果图形边缘可以跨越不同的范围,并且除非它们引用完全相同的节点,否则不能共享不同范围中的名称,那么这似乎首先会破坏具有不同范围的目的。我在这里误解了什么?

I think your understanding or usage of scopes is flawed, see above

我认为你对范围的理解或使用是有缺陷的,见上文

#1


2  

What is the proper way to move trained LSTM weights from one graph to another?

将经过训练的LSTM权重从一个图移动到另一个图的正确方法是什么?

You can create your decoding graph first (with a saver object to save the parameters) and create a GraphDef object that you can import in your bigger training graph:

您可以先创建解码图(使用保存对象保存参数)并创建一个GraphDef对象,您可以在更大的训练图中导入该对象:

basegraph = tf.Graph()
with basegraph.as_default():
   ***your graph***

traingraph = tf.Graph()
with traingraph.as_default():
     tf.import_graph_def(basegraph.as_graph_def())
     ***your training graph***

make sure you load your variables when you start a session for a new graph.

确保在启动新图表的会话时加载变量。

I don't have experience with this functionality so you may have to look into it a bit more

我没有使用此功能的经验,因此您可能需要更多地研究它

Is it correct to say that starting a new session "releases resources", but doesn't erase the graph built in memory?

说开始新会话“释放资源”,但不删除内存中的图形是否正确?

yep, the graph object still hold it

是的,图形对象仍然持有它

It seems to me like the 'reuse' feature allows Tensorflow to search outside of the current variable scope for variables with the same name (existing in a different scope), and use them in the current scope. Is this correct? If it is, what happens to all of the graph edges from the non-current scope that link to that variable? If it isn't, why does Tensorflow throw an error if you try to have the same variable name within two different scopes? It seems perfectly reasonable to define two variables with identical names in two different scopes, e.g. conv1/sum1 and conv2/sum1.

在我看来,“重用”功能允许Tensorflow在当前变量范围之外搜索具有相同名称(存在于不同范围内)的变量,并在当前范围中使用它们。它是否正确?如果是,那么链接到该变量的非当前范围的所有图形边缘会发生什么?如果不是,如果您尝试在两个不同的范围内使用相同的变量名称,为什么Tensorflow会抛出错误?在两个不同的范围中定义两个具有相同名称的变量似乎是完全合理的,例如conv1 / sum1和conv2 / sum1。

No, reuse is to determine the behaviour when you use get_variable on an existing name, when it is true it will return the existing variable, otherwise it will return a new one. Normally tensorflow should not throw an error. Are you sure your using tf.get_variable and not just tf.Variable?

不,重用是确定在现有名称上使用get_variable时的行为,当它为true时它将返回现有变量,否则它将返回一个新变量。通常,tensorflow不应该抛出错误。你确定你使用tf.get_variable而不仅仅是tf.Variable吗?

In my code I'm working within a new scope but the graph won't run without data to be fed into a placeholder from the initial, default scope. Is the default scope always 'in-scope' for some reason?

在我的代码中,我正在一个新的范围内工作,但是如果没有数据从初始的默认范围输入占位符,图形将无法运行。由于某种原因,默认范围是否始终为“范围内”?

I don't really see what you mean. The do not always have to be used. If a placeholder is not required for running an operation you don't have to define it.

我真的不明白你的意思。并不总是必须使用。如果运行操作不需要占位符,则不必定义它。

If graph edges can span different scopes, and names in different scopes can't be shared unless they refer to the exact same node, then that would seem to defeat the purpose of having different scopes in the first place. What am I misunderstanding here?

如果图形边缘可以跨越不同的范围,并且除非它们引用完全相同的节点,否则不能共享不同范围中的名称,那么这似乎首先会破坏具有不同范围的目的。我在这里误解了什么?

I think your understanding or usage of scopes is flawed, see above

我认为你对范围的理解或使用是有缺陷的,见上文