加载训练有素的Keras模型并继续训练

时间:2022-02-03 13:07:43

I was wondering if it was possible to save a partly trained Keras model and continue the training after loading the model again.

我想知道是否有可能保存一个部分训练的Keras模型并在再次加载模型后继续训练。

The reason for this is that I will have more training data in the future and I do not want to retrain the whole model again.

这样做的原因是我将来会有更多的训练数据,我不想重新训练整个模型。

The functions which I am using are:

我正在使用的功能是:

#Partly train model
model.fit(first_training, first_classes, batch_size=32, nb_epoch=20)

#Save partly trained model
model.save('partly_trained.h5')

#Load partly trained model
from keras.models import load_model
model = load_model('partly_trained.h5')

#Continue training
model.fit(second_training, second_classes, batch_size=32, nb_epoch=20)

Edit 1: added fully working example

编辑1:添加了完整的工作示例

With the first dataset after 10 epochs the loss of the last epoch will be 0.0748 and the accuracy 0.9863.

对于10个时期之后的第一个数据集,最后一个纪元的损失将是0.0748,准确度为0.9863。

After saving, deleting and reloading the model the loss and accuracy of the model trained on the second dataset will be 0.1711 and 0.9504 respectively.

保存,删除和重新加载模型后,在第二个数据集上训练的模型的损失和准确性将分别为0.1711和0.9504。

Is this caused by the new training data or by a completely re-trained model?

这是由新的训练数据还是完全重新训练的模型引起的?

"""
Model by: http://machinelearningmastery.com/
"""
# load (downloaded if needed) the MNIST dataset
import numpy
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.utils import np_utils
from keras.models import load_model
numpy.random.seed(7)

def baseline_model():
    model = Sequential()
    model.add(Dense(num_pixels, input_dim=num_pixels, init='normal', activation='relu'))
    model.add(Dense(num_classes, init='normal', activation='softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

if __name__ == '__main__':
    # load data
    (X_train, y_train), (X_test, y_test) = mnist.load_data()

    # flatten 28*28 images to a 784 vector for each image
    num_pixels = X_train.shape[1] * X_train.shape[2]
    X_train = X_train.reshape(X_train.shape[0], num_pixels).astype('float32')
    X_test = X_test.reshape(X_test.shape[0], num_pixels).astype('float32')
    # normalize inputs from 0-255 to 0-1
    X_train = X_train / 255
    X_test = X_test / 255
    # one hot encode outputs
    y_train = np_utils.to_categorical(y_train)
    y_test = np_utils.to_categorical(y_test)
    num_classes = y_test.shape[1]

    # build the model
    model = baseline_model()

    #Partly train model
    dataset1_x = X_train[:3000]
    dataset1_y = y_train[:3000]
    model.fit(dataset1_x, dataset1_y, nb_epoch=10, batch_size=200, verbose=2)

    # Final evaluation of the model
    scores = model.evaluate(X_test, y_test, verbose=0)
    print("Baseline Error: %.2f%%" % (100-scores[1]*100))

    #Save partly trained model
    model.save('partly_trained.h5')
    del model

    #Reload model
    model = load_model('partly_trained.h5')

    #Continue training
    dataset2_x = X_train[3000:]
    dataset2_y = y_train[3000:]
    model.fit(dataset2_x, dataset2_y, nb_epoch=10, batch_size=200, verbose=2)
    scores = model.evaluate(X_test, y_test, verbose=0)
    print("Baseline Error: %.2f%%" % (100-scores[1]*100))

4 个解决方案

#1


11  

Actually - model.save saves all information need for restarting training in your case. The only thing which could be spoiled by reloading model is your optimizer state. To check that - try to save and reload model and train it on training data.

实际上 - model.save保存了在您的情况下重新启动培训所需的所有信息。唯一可以通过重新加载模型来破坏的是您的优化器状态。要检查 - 尝试保存并重新加载模型并在训练数据上进行训练。

#2


2  

Notice that Keras sometimes has issues with loaded models, as in here. This might explain cases in which you don't start from the same trained accuracy.

请注意,Keras有时会遇到加载模型的问题,如此处所示。这可能会解释您没有从相同的训练精度开始的情况。

#3


0  

The problem might be that you use a different optimizer - or different arguments to your optimizer. I just had the same issue with a custom pretrained model, using

问题可能是您使用不同的优化器 - 或优化器的不同参数。使用自定义预训练模型时我遇到了同样的问题

reduce_lr = ReduceLROnPlateau(monitor='loss', factor=lr_reduction_factor,
                              patience=patience, min_lr=min_lr, verbose=1)

for the pretrained model, whereby the original learning rate starts at 0.0003 and during pre-training it is reduced to the min_learning rate, which is 0.000003

对于预训练模型,原始学习率从0.0003开始,在预训练期间,它降低到min_learning率,即0.000003

I just copied that line over to the script which uses the pre-trained model and got really bad accuracies. Until I noticed that the last learning rate of the pretrained model was the min learning rate, i.e. 0.000003. And if I start with that learning rate, I get exactly the same accuracies to start with as the output of the pretrained model - which makes sense, as starting with a learning rate that is 100 times bigger than the last learning rate used in the pretrained model will result in a huge overshoot of GD and hence in heavily decreased accuracies.

我只是将那条线复制到使用预训练模型的脚本上,并且得到了非常糟糕的精确度。直到我注意到预训练模型的最后学习率是最小学习率,即0.000003。如果我从学习速度开始,我会得到与预训练模型的输出完全相同的精度 - 这是有道理的,因为学习速度比预训练中使用的最后学习速率大100倍模型将导致GD的巨大超调,从而严重降低精度。

#4


0  

All above helps, you must resume from same learning rate() as the LR when the model and weights were saved. Set it directly on the optimizer.

以上都有帮助,您必须在保存模型和权重时从与LR相同的学习率()恢复。直接在优化器上设置它。

Note that improvement from there is not guaranteed, because the model may have reached the local minimum, which may be global. There is no point to resume a model in order to search for another local minimum, unless you intent to increase the learning rate in a controlled fashion and nudge the model into a possibly better minimum not far away.

请注意,从那里的改进无法保证,因为模型可能已达到局部最小值,这可能是全局的。没有必要恢复模型以搜索另一个局部最小值,除非您打算以受控方式提高学习速率并将模型推到可能更好的最小值不远处。

#1


11  

Actually - model.save saves all information need for restarting training in your case. The only thing which could be spoiled by reloading model is your optimizer state. To check that - try to save and reload model and train it on training data.

实际上 - model.save保存了在您的情况下重新启动培训所需的所有信息。唯一可以通过重新加载模型来破坏的是您的优化器状态。要检查 - 尝试保存并重新加载模型并在训练数据上进行训练。

#2


2  

Notice that Keras sometimes has issues with loaded models, as in here. This might explain cases in which you don't start from the same trained accuracy.

请注意,Keras有时会遇到加载模型的问题,如此处所示。这可能会解释您没有从相同的训练精度开始的情况。

#3


0  

The problem might be that you use a different optimizer - or different arguments to your optimizer. I just had the same issue with a custom pretrained model, using

问题可能是您使用不同的优化器 - 或优化器的不同参数。使用自定义预训练模型时我遇到了同样的问题

reduce_lr = ReduceLROnPlateau(monitor='loss', factor=lr_reduction_factor,
                              patience=patience, min_lr=min_lr, verbose=1)

for the pretrained model, whereby the original learning rate starts at 0.0003 and during pre-training it is reduced to the min_learning rate, which is 0.000003

对于预训练模型,原始学习率从0.0003开始,在预训练期间,它降低到min_learning率,即0.000003

I just copied that line over to the script which uses the pre-trained model and got really bad accuracies. Until I noticed that the last learning rate of the pretrained model was the min learning rate, i.e. 0.000003. And if I start with that learning rate, I get exactly the same accuracies to start with as the output of the pretrained model - which makes sense, as starting with a learning rate that is 100 times bigger than the last learning rate used in the pretrained model will result in a huge overshoot of GD and hence in heavily decreased accuracies.

我只是将那条线复制到使用预训练模型的脚本上,并且得到了非常糟糕的精确度。直到我注意到预训练模型的最后学习率是最小学习率,即0.000003。如果我从学习速度开始,我会得到与预训练模型的输出完全相同的精度 - 这是有道理的,因为学习速度比预训练中使用的最后学习速率大100倍模型将导致GD的巨大超调,从而严重降低精度。

#4


0  

All above helps, you must resume from same learning rate() as the LR when the model and weights were saved. Set it directly on the optimizer.

以上都有帮助,您必须在保存模型和权重时从与LR相同的学习率()恢复。直接在优化器上设置它。

Note that improvement from there is not guaranteed, because the model may have reached the local minimum, which may be global. There is no point to resume a model in order to search for another local minimum, unless you intent to increase the learning rate in a controlled fashion and nudge the model into a possibly better minimum not far away.

请注意,从那里的改进无法保证,因为模型可能已达到局部最小值,这可能是全局的。没有必要恢复模型以搜索另一个局部最小值,除非您打算以受控方式提高学习速率并将模型推到可能更好的最小值不远处。