caffe之python接口实战 :02-fine-tuning 官方教程源码解析

时间:2021-11-24 04:54:59

本文是官方文档的源码解析笔记系列之一

注1:本文内容属于caffe_root/example/下的ipynb文件的源码解析,旨在通过源码注释,加速初学者的学习进程。
注2:以下解析中,未对各部分英文注释做翻译,旨在告诫初学者,应该去适应原汁原味的英文教程阅读,这样有助于提升自己阅读技术文献的能力,也是高级程序员的必备素养。
注3:建议大家在jupyter nootebook环境下结合源码注释,运行程序。

Fine-tuning a Pretrained Network for Style Recognition

此示例中,传入finetune网络的数据源采用ImageDataLayer,而不是MemoryDataLayer(ndarray)也不是caffenet原有的LMDB形式,但需转化为caffenet的预处理blob形式(如图像的大小,有没有减均值,是用0-1值还是0-255值)

In this example, we’ll explore a common approach that is particularly useful in real-world applications: take a pre-trained Caffe network and fine-tune the parameters on your custom data.

The advantage of this approach is that, since pre-trained networks are learned on a large set of images, the intermediate layers capture the “semantics” of the general visual appearance. Think of it as a very powerful generic visual feature that you can treat as a black box. On top of that, only a relatively small amount of data is needed for good performance on the target task.

First, we will need to prepare the data. This involves the following parts:
(1) Get the ImageNet ilsvrc pretrained model with the provided shell scripts.
(2) Download a subset of the overall Flickr style dataset for this demo.
(3) Compile the downloaded Flickr dataset into a database that Caffe can then consume.

caffe_root = '../'  # this file should be run from {caffe_root}/examples (otherwise change this line)

import sys
sys.path.insert(0, caffe_root + 'python')
import caffe

caffe.set_device(0)
caffe.set_mode_gpu()

import numpy as np
from pylab import *
%matplotlib inline
import tempfile

# Helper function for deprocessing 逆向处理preprocessed images, e.g., for display.
def deprocess_net_image(image):
    image = image.copy()              # don't modify destructively
    image = image[::-1]               # BGR -> RGB
    image = image.transpose(1, 2, 0)  # CHW -> HWC
    image += [123, 117, 104]          # (approximately) undo mean subtraction

    # 截断clamp values in [0, 255]
    image[image < 0], image[image > 255] = 0, 255

    # round and cast from float32 to uint8
    image = np.round(image)
    image = np.require(image, dtype=np.uint8)#改变数据类型

    return image

1. Setup and dataset download

Download data required for this exercise.

  • get_ilsvrc_aux.sh to download the ImageNet data mean, labels, etc.
  • download_model_binary.py to download the pretrained reference model
  • finetune_flickr_style/assemble_data.py downloads the style training and testing data

We’ll download just a small subset of the full dataset for this exercise: just 2000 of the 80K images, from 5 of the 20 style categories. (To download the full dataset, set full_dataset = True in the cell below.)下面的脚本为选择8K图片的前2000张图片,对应的类标签为20中风格的前5种风格

# Download just a small subset of the data for this exercise.
# (2000 of 80K images, 5 of 20 labels.)
# To download the entire dataset, set `full_dataset = True`.
full_dataset = False
if full_dataset: #finetune全部数据还是部分数据
    NUM_STYLE_IMAGES = NUM_STYLE_LABELS = -1
else:
    NUM_STYLE_IMAGES = 2000
    NUM_STYLE_LABELS = 5

# This downloads the ilsvrc auxiliary data (mean file, etc),
# and a subset of 2000 images for the style recognition task.

import os
os.chdir(caffe_root)  # run scripts from caffe root,运行脚本一定要从根目录运行!
!data/ilsvrc12/get_ilsvrc_aux.sh#下载Image-net data的均值文件和标签
!scripts/download_model_binary.py models/bvlc_reference_caffenet #下载预finetune的模型到models/bvlc_reference_caffenet路径下
!python examples/finetune_flickr_style/assemble_data.py \    #现在进行finetune的训练和测试数据集
    --workers=-1  --seed=1701 \   #为了能复现实验
    --images=$NUM_STYLE_IMAGES  --label=$NUM_STYLE_LABELS  #是要下载全部还是部分子集,这里为子集
# back to examples
os.chdir('examples')

Define weights, the path to the ImageNet pretrained weights we just downloaded, and make sure it exists.

import os
weights = os.path.join(caffe_root, 'models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel')
assert os.path.exists(weights)

Load the 1000 ImageNet labels from ilsvrc12/synset_words.txt, and the 5 style labels from finetune_flickr_style/style_names.txt.

# Load ImageNet labels to imagenet_labels
imagenet_label_file = caffe_root + 'data/ilsvrc12/synset_words.txt'
imagenet_labels = list(np.loadtxt(imagenet_label_file, str, delimiter='\t'))
assert len(imagenet_labels) == 1000
print 'Loaded ImageNet labels:\n', '\n'.join(imagenet_labels[:10] + ['...'])

# Load style labels to style_labels
style_label_file = caffe_root + 'examples/finetune_flickr_style/style_names.txt'
style_labels = list(np.loadtxt(style_label_file, str, delimiter='\n'))
if NUM_STYLE_LABELS > 0:
    style_labels = style_labels[:NUM_STYLE_LABELS]
print '\nLoaded style labels:\n', ', '.join(style_labels)

2. Defining and running the nets

We’ll start by defining caffenet, a function which initializes the CaffeNet architecture (a minor variant on AlexNet), taking arguments specifying the data and number of output classes.

from caffe import layers as L
from caffe import params as P

weight_param = dict(lr_mult=1, decay_mult=1)
bias_param   = dict(lr_mult=2, decay_mult=0)

learned_param = [weight_param, bias_param] #learned_param控制训练网络的乘数学习率,这里的学习率不是基学习率,值不为0,所以会更新对应层的参数
frozen_param = [dict(lr_mult=0)] * 2              #frozen_param = [dict(lr_mult=0)] * 2,对应层的乘数学习率为0,则不会改变对应层的权重参数,

def conv_relu(bottom, ks, nout, stride=1, pad=0, group=1, param=learned_param, weight_filler=dict(type='gaussian', std=0.01), bias_filler=dict(type='constant', value=0.1)):
    conv = L.Convolution(bottom, kernel_size=ks, stride=stride,
                         num_output=nout, pad=pad, group=group,
                         param=param, weight_filler=weight_filler,
                         bias_filler=bias_filler)
    return conv, L.ReLU(conv, in_place=True)

def fc_relu(bottom, nout, param=learned_param, weight_filler=dict(type='gaussian', std=0.005), bias_filler=dict(type='constant', value=0.1)):
    fc = L.InnerProduct(bottom, num_output=nout, param=param,
                        weight_filler=weight_filler,
                        bias_filler=bias_filler)
    return fc, L.ReLU(fc, in_place=True)

def max_pool(bottom, ks, stride=1):
    return L.Pooling(bottom, pool=P.Pooling.MAX, kernel_size=ks, stride=stride)

#以下自定义的预finetune网络舍去了原caffenet的数据输入层,默认参数下,就是原caffenet模型对应的网络
def caffenet(data, label=None, train=True, num_classes=1000, #默认为输出1000类 classifier_name='fc8', learn_all=False): # learn_all=False即默认下只训练更新最后一层自定义层的权重参数
    ##当learn_all为caffenet()的参数,learn_all为真时会更新所有层的权重参数,为假时只会更新最后一层参数,因为自己定义的层的 param = learned_param,而不受
    ##caffenet()中 learn_all的影响
    """Returns a NetSpec specifying CaffeNet, following the original proto text specification (./models/bvlc_reference_caffenet/train_val.prototxt)."""
    #参照/models/bvlc_reference_caffenet/train_val.prototxtchuan创建适合finetune的定制化caffenet
    n = caffe.NetSpec()
    n.data = data #下文中,传入finetune网络的数据源采用ImageDataLayer,并做数据预处理

    ##learned_param = [weight_param, bias_param] 其learned_param控制训练网络的乘数学习率,这里的学习率不是基学习率,值不为0,所以会更新对应层的参数
    ##frozen_param = [dict(lr_mult=0)] * 2 其frozen_param = [dict(lr_mult=0)] * 2,对应层的乘数学习率为0,则不会改变对应层的权重参数,
    param = learned_param if learn_all else frozen_param #注:finetune层的乘数学习率 param = learned_parambushou不受learn_all标志位的影响

    n.conv1, n.relu1 = conv_relu(n.data, 11, 96, stride=4, param=param)
    n.pool1 = max_pool(n.relu1, 3, stride=2)
    n.norm1 = L.LRN(n.pool1, local_size=5, alpha=1e-4, beta=0.75)
    n.conv2, n.relu2 = conv_relu(n.norm1, 5, 256, pad=2, group=2, param=param)
    n.pool2 = max_pool(n.relu2, 3, stride=2)
    n.norm2 = L.LRN(n.pool2, local_size=5, alpha=1e-4, beta=0.75)
    n.conv3, n.relu3 = conv_relu(n.norm2, 3, 384, pad=1, param=param)
    n.conv4, n.relu4 = conv_relu(n.relu3, 3, 384, pad=1, group=2, param=param)
    n.conv5, n.relu5 = conv_relu(n.relu4, 3, 256, pad=1, group=2, param=param)
    n.pool5 = max_pool(n.relu5, 3, stride=2)
    n.fc6, n.relu6 = fc_relu(n.pool5, 4096, param=param)
    if train:    #当train为真,有dropout层作用,为假时不考虑dropout,fc7input可能经dropout接入n.relu6,也可能直接等于relu7而跳过dropout6操作
        n.drop6 = fc7input = L.Dropout(n.relu6, in_place=True)
    else:
        fc7input = n.relu6
    n.fc7, n.relu7 = fc_relu(fc7input, 4096, param=param)
    if train: #当train为真,有dropout层作用,为假时不考虑dropout,fc8input可能经dropout接入n.relu7,也可能直接等于relu7而跳过dropout7操作
        n.drop7 = fc8input = L.Dropout(n.relu7, in_place=True)
    else:
        fc8input = n.relu7
    # always learn fc8 (param=learned_param)即zuiho最后一层为用于finetune,始终未为非冻结乘数学习率状态
    fc8 = L.InnerProduct(fc8input, num_output=num_classes, param=learned_param)
    # give fc8 the name specified by argument `classifier_name`
    #classifier_name值ke可由caffenet()函数传入
    n.__setattr__(classifier_name, fc8) #给最后一层改名,防止装入模型参数的时候,最后一层也被训练好的预finetune模型权重初始化
    if not train:
        n.probs = L.Softmax(fc8)
    if label is not None:
        n.label = label #送入定制化网络的batchsize数据有真实标签时计算一个batchsize的准确率
        n.loss = L.SoftmaxWithLoss(fc8, n.label)
        n.acc = L.Accuracy(fc8, n.label)
    # write the net to a temporary file and return its filename
    with tempfile.NamedTemporaryFile(delete=False) as f: #返回一个随机的f.name,不同于with open(test_net_path, 'w') as f:
        f.write(str(n.to_proto()))
        return f.name

Now, let’s create a CaffeNet that takes unlabeled “dummy data” as input, allowing us to set its input images externally and see what ImageNet classes it predicts.

dummy_data = L.DummyData(shape=dict(dim=[1, 3, 227, 227]))
imagenet_net_filename = caffenet(data=dummy_data, train=False) #返回值为随机生成的一个网络的名字
imagenet_net = caffe.Net(imagenet_net_filename, weights, caffe.TEST) #初始化预finetune的网络,采用caffenet模型的权重参数

Define a function style_net which calls caffenet on data from the Flickr style dataset.

The new network will also have the CaffeNet architecture, with differences in the input and output:

  • the input is the Flickr style data we downloaded, provided by an ImageData layer
  • the output is a distribution over 20 classes rather than the original 1000 ImageNet classes
  • the classification layer is renamed from fc8 to fc8_flickr to tell Caffe not to load the original classifier (fc8) weights from the ImageNet-pretrained model
#传入finetune网络的数据源采用ImageDataLayer,而不是MemoryDataLayer(ndarray)也不是caffenet原有的LMDB形式,
#但需转化为caffenet的预处理blob形式(如图像的大小,有没有减均值,是用0-1值还是0-255值)
#默认下为读入训练集,需要做镜像处理
#train为假时去dropout,并把softmaxwith改成softmax,不做图片镜像,subset变为test,即输入网络数据源变为测试数据集。
#subset控制读入数据的来源时训练集还是测试集,默认为读入训练数据进行finetune
def style_net(train=True, learn_all=False, subset=None): 
    if subset is None:
        subset = 'train' if train else 'test'
    source = caffe_root + 'data/flickr_style/%s.txt' % subset  #读入数据的来源时训练集还是测试集
    #data层预处理,化为caffenet的预处理blob形式
    transform_param = dict(mirror=train, crop_size=227,  #mirror=train,即ceshi测试的时候不需要做镜像,默认为数据需要镜像预处理
        mean_file=caffe_root + 'data/ilsvrc12/imagenet_mean.binaryproto')
    style_data, style_label = L.ImageData(transform_param=transform_param, source=source, batch_size=50, new_height=256, new_width=256, ntop=2) #batch_size=50
    return caffenet(data=style_data, label=style_label, train=train, #train为假时去dropout,并把softmaxwith改成softmax
                    num_classes=NUM_STYLE_LABELS,  #NUM_STYLE_LABELS = 5即输出为5类style
                    classifier_name='fc8_flickr', #自定义finetune最后全连接层的名字
                    learn_all=learn_all) #learn_all=False则只更新最后一层的权重参数

Use the style_net function defined above to initialize untrained_style_net, a CaffeNet with input images from the style dataset and weights from the pretrained ImageNet model.

Call forward on untrained_style_net to get a batch of style training data.

#一次化完成自定义网络的初始化和权重参数的初始化,train=False表示定义并初始化的是test网络,trian=false表示读入的是test数据集,
##train为假时去dropout,并把softmaxwith改成softmax,不做图片镜像,subset变为test,即输入网络数据源变为测试数据集。
untrained_style_net = caffe.Net(style_net(train=False, subset='train'), #封装输出的是5类,caffe.Net()产用于测试或部署模型,用于生成内存中部署好网络模型的对象实例
                                weights, caffe.TEST)
untrained_style_net.forward() #前向一次,导入一个batchsize的训练数据
style_data_batch = untrained_style_net.blobs['data'].data.copy() #batchsize*c*h*w形式的数据
style_label_batch = np.array(untrained_style_net.blobs['label'].data, dtype=np.int32)

Pick one of the style net training images from the batch of 50 (we’ll arbitrarily choose #8 here). Display it, then run it through imagenet_net, the ImageNet-pretrained network to view its top 5 predicted classes from the 1000 ImageNet classes.

Below we chose an image where the network’s predictions happen to be reasonable, as the image is of a beach, and “sandbar” and “seashore” both happen to be ImageNet-1000 categories. For other images, the predictions won’t be this good, sometimes due to the network actually failing to recognize the object(s) present in the image, but perhaps even more often due to the fact that not all images contain an object from the (somewhat arbitrarily chosen) 1000 ImageNet categories. Modify the batch_index variable by changing its default setting of 8 to another value from 0-49 (since the batch size is 50) to see predictions for other images in the batch. (To go beyond this batch of 50 images, first rerun the above cell to load a fresh batch of data into style_net.)

def disp_preds(net, image, labels, k=5, name='ImageNet'):
    input_blob = net.blobs['data']
    net.blobs['data'].data[0, ...] = image #放入batchsize中索引为0的图片blob中
    probs = net.forward(start='conv1')['probs'][0] #这样不会更新batchsize,而是读入image数据,第0张图片处理结果输出
    top_k = (-probs).argsort()[:k] #因为argsort默认为升序排序,加负号则相当于probs变为降序排列并返回最大的k个值的索引
    print 'top %d predicted %s labels =' % (k, name)
    print '\n'.join('\t(%d) %5.2f%% %s' % (i+1, 100*probs[p], labels[p]) #'\n'.join即\n为打印元组元素之间的插入符,而不是空格!
                    for i, p in enumerate(top_k))

def disp_imagenet_preds(net, image):
    disp_preds(net, image, imagenet_labels, name='ImageNet')

def disp_style_preds(net, image):
    disp_preds(net, image, style_labels, name='style')
batch_index = 8
image = style_data_batch[batch_index]#取第八张图片
plt.imshow(deprocess_net_image(image)) #图片由BGR到RGB,由CHW到HWC,值放缩到0-255,加均值, float32 to uint8,因为函数无cmap=gray,所以需加均值
print 'actual label =', style_labels[style_label_batch[batch_index]] #取第八张图片对应, style_labels = style_labels[:NUM_STYLE_LABELS]
disp_imagenet_preds(imagenet_net, image) #imagenet_net = caffe.Net(imagenet_net_filename, weights, caffe.TEST) ,caffenet()函数采用了默认参数,所以对应原模型网络

We can also look at untrained_style_net’s predictions, but we won’t see anything interesting as its classifier hasn’t been trained yet.

In fact, since we zero-initialized the classifier (see caffenet definition – no weight_filler is passed to the final InnerProduct layer), the softmax inputs should be all zero and we should therefore see a predicted probability of 1/N for each label (for N labels). Since we set N = 5, we get a predicted probability of 20% for each class.

disp_style_preds(untrained_style_net, image) #这里的测试输出结果不会准确,因为还没finetune训练呢

We can also verify that the activations in layer fc7 immediately before the classification layer are the same as (or very close to) those in the ImageNet-pretrained model, since both models are using the same pretrained weights in the conv1 through fc7 layers.

diff = untrained_style_net.blobs['fc7'].data[0] - imagenet_net.blobs['fc7'].data[0]
error = (diff ** 2).sum()
assert error < 1e-8

Delete untrained_style_net to save memory. (Hang on to imagenet_net as we’ll use it again later.)

del untrained_style_net  #先删掉内存中生成的untrained_style_net,以节省内存

3. Training the style classifier

Now, we’ll define a function solver to create our Caffe solvers, which are used to train the network (learn its weights). In this function we’ll set values for various parameters used for learning, display, and “snapshotting” – see the inline comments for explanations of what they mean. You may want to play with some of the learning parameters to see if you can improve on the results here!

from caffe.proto import caffe_pb2

def solver(train_net_path, test_net_path=None, base_lr=0.001): #这里的基学习率,不同于网络定义中乘数学习率
    s = caffe_pb2.SolverParameter()#用于序列化生成solver.prototxt的类对象,

    # Specify locations of the train and (maybe) test networks.
    s.train_net = train_net_path
    if test_net_path is not None:
        s.test_net.append(test_net_path)
        s.test_interval = 1000  # Test after every 1000 training iterations.
        s.test_iter.append(100) # Test on 100 batches each time we test.即style子集的测试集大小为100*50=5000,因为前面的batchsize为50

    # The number of iterations over which to average the gradient.
    # Effectively boosts the training batch size by the given factor, without
    # affecting memory utilization.
    s.iter_size = 1

    s.max_iter = 100000     # # of times to update the net (training iterations),因为前面的batchsize为50,共会有10万次batchsize的迭代

    # Solve using the stochastic gradient descent (SGD) algorithm.
    # Other choices include 'Adam' and 'RMSProp'.
    s.type = 'SGD'

    # Set the initial learning rate for SGD.
    s.base_lr = base_lr

    # Set `lr_policy` to define how the learning rate changes during training.
    # Here, we 'step' the learning rate by multiplying it by a factor `gamma`
    # every `stepsize` iterations.
    s.lr_policy = 'step'
    s.gamma = 0.1
    s.stepsize = 20000

    # Set other SGD hyperparameters. Setting a non-zero `momentum` takes a
    # weighted average of the current gradient and previous gradients to make
    # learning more stable. L2 weight decay regularizes learning, to help prevent
    # the model from overfitting.
    s.momentum = 0.9  #使学习过程更平稳
    s.weight_decay = 5e-4 #L2正则化项的权重衰减系数,防止过拟合

    # Display the current training loss and accuracy every 1000 iterations.
    s.display = 1000

    # Snapshots are files used to store networks we've trained. Here, we'll
    # snapshot every 10K iterations -- ten times during training.
    s.snapshot = 10000
    s.snapshot_prefix = caffe_root + 'models/finetune_flickr_style/finetune_flickr_style'

    # Train on the GPU. Using the CPU to train large networks is very slow.
    s.solver_mode = caffe_pb2.SolverParameter.GPU

    # Write the solver to a temporary file and return its filename.
    with tempfile.NamedTemporaryFile(delete=False) as f:  #建立有名字的文件,名字在其name属性中,delete=False表示sheng生成的文件在程序运行中不可删除
        f.write(str(s))
        return f.name #若用ubuntu,一般为/var/folders/9R/9R1t+tRjjdnnv/tmp0Z的形式

Now we’ll invoke the solver to train the style net’s classification layer.

For the record, if you want to train the network using only the command line tool, this is the command:


build/tools/caffe train \
-solver models/finetune_flickr_style/solver.prototxt \
-weights models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel \
-gpu 0

However, we will train using Python in this example.

We’ll first define run_solvers, a function that takes a list of solvers and steps each one in a round robin manner, recording the accuracy and loss values each iteration. At the end, the learned weights are saved to a file.

#同时训练两个net,或多个net可以!!!
def run_solvers(niter, solvers, disp_interval=10):   #送入solver的参数形如:solvers = [('pretrained', style_solver), ('scratch', scratch_style_solver)]
    """Run solvers for niter iterations, returning the loss and accuracy recorded each iteration. `solvers` is a list of (name, solver) tuples."""
    blobs = ('loss', 'acc')
    loss, acc = ({name: np.zeros(niter) for name, _ in solvers}
                 for _ in blobs) #for or _ in blobs感觉可舍去,name为 for name, _ in solvers是传给的,得到的loss和acc都是字典类型,有几个name就有几个键值
    for it in range(niter):
        for name, s in solvers:  #s迭代的时定义的net网络对象
            s.step(1)  # run a single SGD step in Caffe,#这样就不会按照solver.prototxt文件的参数自动进行训练,而是一步一步受控的迭代训练!!!!!
            loss[name][it], acc[name][it] = (s.net.blobs[b].data.copy()
                                             for b in blobs)# 写的真简洁!!,b分别为loss层输出的标量值和acc标量值,都是一个batchsize的计算值
        if it % disp_interval == 0 or it + 1 == niter: #迭代了disp_interval次或者迭代次数达到了上限都输出一次loss和acc
            loss_disp = '; '.join('%s: loss=%.3f, acc=%2d%%' %
                                  (n, loss[n][it], np.round(100*acc[n][it]))
                                  for n, _ in solvers)
            print '%3d) %s' % (it, loss_disp)     
    # Save the learned weights from both nets.
    weight_dir = tempfile.mkdtemp() #tempfile是一个独立的python库!!zai在此,为创建零时目录,返回的weight_dir即为目录名
    weights = {}
    for name, s in solvers:
        filename = 'weights.%s.caffemodel' % name
        weights[name] = os.path.join(weight_dir, filename)
        s.net.save(weights[name]) #name为各net对应的solver求解器对象句柄,如下文的style_solver 和scratch_style_solver ,保存模型权重参数到临时建立的目录下文件中!
    return loss, acc, weights  #返回的都是字典类型的数据结构,weights存的是weight_dir路径+filename为名的weight文件

Let’s create and run solvers to train nets for the style recognition task. We’ll create two solvers – one (style_solver) will have its train net initialized to the ImageNet-pretrained weights (this is done by the call to the copy_from method), and the other (scratch_style_solver) will start from a randomly initialized net.

During training, we should see that the ImageNet pretrained net is learning faster and attaining better accuracies than the scratch net.

niter = 200  # number of iterations to train

# Reset style_solver as before.
style_solver_filename = solver(style_net(train=True)) #solver(train_net_path, test_net_path=None, base_lr=0.001),默认下只训练更新最后一层改动的层权重参数
style_solver = caffe.get_solver(style_solver_filename)
style_solver.net.copy_from(weights) #对预训练网络装入caffenet权重参数!!!!!这步很重要!!!!!!!!!

# For reference, we also create a solver that isn't initialized from
# the pretrained ImageNet weights.
scratch_style_solver_filename = solver(style_net(train=True))
scratch_style_solver = caffe.get_solver(scratch_style_solver_filename)

print 'Running solvers for %d iterations...' % niter
solvers = [('pretrained', style_solver), ('scratch', scratch_style_solver)]
loss, acc, weights = run_solvers(niter, solvers)
print 'Done.'

train_loss, scratch_train_loss = loss['pretrained'], loss['scratch']
train_acc, scratch_train_acc = acc['pretrained'], acc['scratch']
style_weights, scratch_style_weights = weights['pretrained'], weights['scratch']

# Delete solvers to save memory.
del style_solver, scratch_style_solver, solvers

Let’s look at the training loss and accuracy produced by the two training procedures. Notice how quickly the ImageNet pretrained model’s loss value (blue) drops, and that the randomly initialized model’s loss value (green) barely (if at all) improves from training only the classifier layer.

plot(np.vstack([train_loss, scratch_train_loss]).T)#2*200转置为200*2,并plot每一列画一条线,横坐标为每列的长度
xlabel('Iteration #')
ylabel('Loss')
plot(np.vstack([train_acc, scratch_train_acc]).T)
xlabel('Iteration #')
ylabel('Accuracy')

Let’s take a look at the testing accuracy after running 200 iterations of training. Note that we’re classifying among 5 classes, giving chance accuracy of 20%. We expect both results to be better than chance accuracy (20%), and we further expect the result from training using the ImageNet pretraining initialization to be much better than the one from training from scratch. Let’s see.

def eval_style_net(weights, test_iters=10): #评价网络函数中设置是用测试数据集
    #默认下def style_net(train=True, learn_all=False, subset=None): 
    #False标志是为了去掉dropout作用,并把softmaxwith改成softmax,对读入batchsize图像不会做镜像,且用test数据集源。caffe.TEST表示不会后向梯度计算,
    test_net = caffe.Net(style_net(train=False), weights, caffe.TEST) #caffe.Net()产用于测试或部署模型,用于生成内存中部署好网络模型的对象实例
    accuracy = 0
    for it in xrange(test_iters):
        accuracy += test_net.forward()['acc']
    accuracy /= test_iters
    return test_net, accuracy
test_net, accuracy = eval_style_net(style_weights)
print 'Accuracy, trained from ImageNet initialization: %3.1f%%' % (100*accuracy, )
scratch_test_net, scratch_accuracy = eval_style_net(scratch_style_weights)
print 'Accuracy, trained from random initialization: %3.1f%%' % (100*scratch_accuracy, )

4. End-to-end finetuning for style

Finally, we’ll train both nets again, starting from the weights we just learned. The only difference this time is that we’ll be learning the weights “end-to-end” by turning on learning in all layers of the network, starting from the RGB conv1 filters directly applied to the input image. We pass the argument learn_all=True to the style_net function defined earlier in this notebook, which tells the function to apply a positive (non-zero) lr_mult value for all parameters. Under the default, learn_all=False, all parameters in the pretrained layers (conv1 through fc7) are frozen (lr_mult = 0), and we learn only the classifier layer fc8_flickr.

Note that both networks start at roughly the accuracy achieved at the end of the previous training session, and improve significantly with end-to-end training. To be more scientific, we’d also want to follow the same additional training procedure without the end-to-end training, to ensure that our results aren’t better simply because we trained for twice as long. Feel free to try this yourself!

end_to_end_net = style_net(train=True, learn_all=True) #与上一步不同的时这次会训练更新所有层的权重参数!!

# Set base_lr to 1e-3, the same as last time when learning only the classifier.
# You may want to play around with different values of this or other
# optimization parameters when fine-tuning. For example, if learning diverges
# (e.g., the loss gets very large or goes to infinity/NaN), you should try
# decreasing base_lr (e.g., to 1e-4, then 1e-5, etc., until you find a value
# for which learning does not diverge).
base_lr = 0.001#若训练过程中发散则采用更小的值

style_solver_filename = solver(end_to_end_net, base_lr=base_lr)
style_solver = caffe.get_solver(style_solver_filename)
style_solver.net.copy_from(style_weights)

scratch_style_solver_filename = solver(end_to_end_net, base_lr=base_lr)
scratch_style_solver = caffe.get_solver(scratch_style_solver_filename)
scratch_style_solver.net.copy_from(scratch_style_weights)

print 'Running solvers for %d iterations...' % niter
solvers = [('pretrained, end-to-end', style_solver),
           ('scratch, end-to-end', scratch_style_solver)]
_, _, finetuned_weights = run_solvers(niter, solvers)
print 'Done.'

style_weights_ft = finetuned_weights['pretrained, end-to-end']
scratch_style_weights_ft = finetuned_weights['scratch, end-to-end']

# Delete solvers to save memory.
del style_solver, scratch_style_solver, solvers

Let’s now test the end-to-end finetuned models. Since all layers have been optimized for the style recognition task at hand, we expect both nets to get better results than the ones above, which were achieved by nets with only their classifier layers trained for the style task (on top of either ImageNet pretrained or randomly initialized weights).

test_net, accuracy = eval_style_net(style_weights_ft) #封装的style_net()函数中train=False,即用测试集做测试,默认时抽取10个batchsize做测试
print 'Accuracy, finetuned from ImageNet initialization: %3.1f%%' % (100*accuracy, )
scratch_test_net, scratch_accuracy = eval_style_net(scratch_style_weights_ft)
print 'Accuracy, finetuned from random initialization: %3.1f%%' % (100*scratch_accuracy, )

We’ll first look back at the image we started with and check our end-to-end trained model’s predictions.

plt.imshow(deprocess_net_image(image)) #image来自于训练集的第八张图片
disp_style_preds(test_net, image)#def style_net(train=True, learn_all=False, subset=None): 
##train为假时去dropout,并把softmaxwith改成softmax,不做图片镜像。subset控制读入数据的来源时训练集还是测试集,默认为读入训练数据进行finetune

Whew, that looks a lot better than before! But note that this image was from the training set, so the net got to see its label at training time.

Finally, we’ll pick an image from the test set (an image the model hasn’t seen) and look at our end-to-end finetuned style model’s predictions for it.

batch_index = 1
image = test_net.blobs['data'].data[batch_index] #因为前面test_net用测试数据集的batchsize进行过前向了,所以取网络中存在的batchsize的第2张图片进行测试
plt.imshow(deprocess_net_image(image))
print 'actual label =', style_labels[int(test_net.blobs['label'].data[batch_index])]
disp_style_preds(test_net, image)

We can also look at the predictions of the network trained from scratch. We see that in this case, the scratch network also predicts the correct label for the image (Pastel), but is much less confident in its prediction than the pretrained net.

disp_style_preds(scratch_test_net, image)

Of course, we can again look at the ImageNet model’s predictions for the above image:

disp_imagenet_preds(imagenet_net, image)

So we did finetuning and it is awesome. Let’s take a look at what kind of results we are able to get with a longer, more complete run of the style recognition dataset. Note: the below URL might be occasionally down because it is run on a research machine.

http://demo.vislab.berkeleyvision.org/