matlab版(超详细)通过代码理解faster-RCNN中的RPN

时间:2024-02-19 11:29:53

http://blog.csdn.net/happyflyy/article/details/54917514

注意:整个RPN完全是笔者自己的理解,可能会有一些理解错误的地方。

1. RPN简介

RPN是regional proposal networks的缩写,是faster-RCNN结构中的一部分。faster-RCNN由两个子网络构成。第一个子网络RPN的作用是在给定图像上提取一定数量带有objectness(是否包含目标的置信度)。第二个子网络直接利用fast-rcnn中的特征提取网络,用RPN获得的proposal替代fast-RCNN中selective search获取的proposal。

2. RPN的结构

RPN的原理图如下图所示。 
RPN的结构是在已有的网路结构(例如VGG)的最后一层上添加如下图的新层。以VGG为例,下图中每部分的具体结构为: 
1. conv feature map:在VGG的conv5_3后新添加的一个512@3x3的卷基层。 
2. k anchor boxes:在每个sliding window的点上的初始化的参考区域。每个sliding window的点上取得anchor boxes都一样。只要知道sliding window的点的坐标,就可以计算出每个anchor box的具体坐标。faster-RCNN中k=9,先确定一个base anchor,大小为16×16,保持面积不变使其长宽比为(0.5,1,2),再对这三个不同长宽比的anchor放大(8,16,32)三个尺度,一共得到9个anchors。 
3. intermediate layer:作者代码中并没有这个输出256d特征的中间层,直接通过1×1的卷积获得2k scores和4k cordinates。作者在文中解释为用全卷积方式替代全连接。 
4. 2k scores:对于每个anchor,用了softmax layer的方式,会或得两个置信度。作者在文中说也可以用sigmoid方式获得一维是正例的置信度。 
5. 4k cordinates:每个窗口的坐标。这个坐标并不是anchor的绝对坐标,而是通过anchor回归groundtruth的位置所需要的偏差(会在下一节具体介绍)。

这里写图片描述

对于一幅大小为600×800的图像,通过VGG之后,conv5_3的大小为38×50,则总的anchor的个数为38×50×9

3. 通过代码理解RPN

运行代码环境:Ubuntu14.04,MatlabR2016a。

1 准备

假设已经安装好caffe所需要的依赖库,faster-RCNN中有caffe的matlab接口,所以不需要安装编译caffe。以PASCAL VOC0712为例:

Step1: 下载faster-RCNN的源代码并解压。下载地址为https://github.com/ShaoqingRen/faster_rcnn。假设解压之后路径为$FASTERRCNN/

Step2:下载VOC07和VOC12并解压到任意文件夹(最好解压到$FASTERRCNN/datasets/)。

Step3:下载网络模型文件以及预训练的VGG,解压后拷贝到$FASTERRCNN/。下载地址为https://pan.baidu.com/s/1mgzSnI4

Step4:在shell中进入$FASTERRCNN/并运行matlab。

2 faster-RCNN的文件结构

经过上面的准备之后,matlab中faster-RCNN的文件结构如下图所示: 
这里写图片描述

./bin:./functions/nms中非极大值抑制(NMS)的c代码mex之后的文件 
./datasets:VOC数据集的存放路径 
./experimenet:训练或者测试的入口函数 
./external:caffe的matlab接口。只需安装好caffe的依赖库,并不需要编译caffe源文件。 
./fetch_date:下载数据集,预训练模型等文件的函数 
./functions:训练数据处理相关的函数 
./imdb:将VOC数据读入到imdb格式 
./models:基网络(如VGG)的预训练模型;fast-RCNN,RPN网络结构prototxt及求解相关的参数prototxt文件 
./utils:一些其它常用的函数 
注意:./test是笔者在运行测试demo时临时存放的一些测试图像,和faster-RCNN并没有什么关系。

3 训练过程

采用VGG和VOC0712,其对应的训练文件为$FASTERRCNN/experiments/script_faster_rcnn_VOC0712_VGG16.m。由于只理解RPN部分,所以只需要详细了解这个m文件的前一小部分。

% code from $FASTERRCNN/experiments/script_faster_rcnn_VOC0712_VGG16.m
%
% model
model                       = Model.VGG16_for_Faster_RCNN_VOC0712;
% cache base
cache_base_proposal         = \'faster_rcnn_VOC0712_vgg_16layers\';
cache_base_fast_rcnn        = \'\';
% train/test data
dataset                     = [];
use_flipped                 = true;
dataset                     = Dataset.voc0712_trainval(dataset, \'train\', use_flipped);
dataset                     = Dataset.voc2007_test(dataset, \'test\', false);
%% -------------------- TRAIN --------------------
% conf
conf_proposal               = proposal_config(\'image_means\', model.mean_image, \'feat_stride\', model.feat_stride);
conf_fast_rcnn              = fast_rcnn_config(\'image_means\', model.mean_image);
% set cache folder for each stage
model                       = Faster_RCNN_Train.set_cache_folder(cache_base_proposal, cache_base_fast_rcnn, model);
% generate anchors and pre-calculate output size of rpn network 
[conf_proposal.anchors, conf_proposal.output_width_map, conf_proposal.output_height_map] ...
                            = proposal_prepare_anchors(conf_proposal, model.stage1_rpn.cache_name, model.stage1_rpn.test_net_def_file);
%%  stage one proposal
fprintf(\'\n***************\nstage one proposal \n***************\n\');
% train
model.stage1_rpn            = Faster_RCNN_Train.do_proposal_train(conf_proposal, dataset, model.stage1_rpn, opts.do_val);

1参数配置阶段

RPN一共配置了三个参数modeldatasetconf_proposalconf_fast_rcnn是fast-RCNN的参数。

model参数:

指定了RPN和fast-RCNN两个阶段所需要的网络结构配置文件prototxt的路径。通过第一阶段的RPN熟悉其具体过程。 
指定了VGG pre-trained模型及图像均值的路径。

参数model的配置:

% code from $FASTERRCNN/experiments/script_faster_rcnn_VOC0712_VGG16.m
%
% model
model                       = Model.VGG16_for_Faster_RCNN_VOC0712;```

 

具体配置程序为下面的代码片段,只关注RPN第一阶段相关的代码。首先指定了基网络(VGG)预训练模型和图像均值文件路径;然后指定了RPN相关prototxt文件路径;最后设置了RPN测试参数。

% code from $FASTERRCNN/experiments/+Model/VGG16_for_faster_RCNN_VOC0712.m
%
% 基网络(VGG)预训练模型和图像均值文件路径
model.mean_image                                = fullfile(pwd, \'models\', \'pre_trained_models\', \'vgg_16layers\', \'mean_image\');
model.pre_trained_net_file                      = fullfile(pwd, \'models\', \'pre_trained_models\', \'vgg_16layers\', \'vgg16.caffemodel\');
% Stride in input image pixels at the last conv layer
model.feat_stride                               = 16;
% RPN相关prototxt文件路径
%% stage 1 rpn, inited from pre-trained network
model.stage1_rpn.solver_def_file                = fullfile(pwd, \'models\', \'rpn_prototxts\', \'vgg_16layers_conv3_1\', \'solver_60k80k.prototxt\');
model.stage1_rpn.test_net_def_file              = fullfile(pwd, \'models\', \'rpn_prototxts\', \'vgg_16layers_conv3_1\', \'test.prototxt\');
model.stage1_rpn.init_net_file                  = model.pre_trained_net_file;
% RPN测试参数
% rpn test setting
model.stage1_rpn.nms.per_nms_topN                  = -1;
model.stage1_rpn.nms.nms_overlap_thres          = 0.7;
model.stage1_rpn.nms.after_nms_topN             = 2000;
dataset参数:

修改数据集路径

如果VOC数据没有解压在$FASTERRCNN/datasets/文件夹中,更改 $ FASTERRCNN/experiments/+Dataset/private/voc2007_devkit.m$FASTERRCNN/experiments/+Dataset/private/voc2012_devkit.m 中的路径为VOC数据集的解压路径。

% code from `$FASTERRCNN/experiments/+Dataset/private/voc2007_devkit.m`
%
function path = voc2007_devkit()
    path = \'./datasets/VOCdevkit2007\';
end

 

% code from `$FASTERRCNN/experiments/+Dataset/private/voc2012_devkit.m`
%
function path = voc2012_devkit()
    path = \'./datasets/VOCdevkit2012\';
end

 

dataset参数

参数dataset的配置:

% code from $FASTERRCNN/experiments/script_faster_rcnn_VOC0712_VGG16.m
%
% train/test data
dataset                     = [];
use_flipped                 = true;
dataset                     = Dataset.voc0712_trainval(dataset, \'train\', use_flipped);
dataset                     = Dataset.voc2007_test(dataset, \'test\', false);

 

具体实现数据集读取的文件为 $FASTERRCNN/experiments/+Dataset/voc0712_trainval.m$FASTERRCNN/experiments/+Dataset/voc0712_test。首先获得数据集存储路径;然后将数据读入到imdb和roidb文件。

% code from $FASTERRCNN/experiments/+Dataset/voc0712_trainval.m
%
% 获得数据集存储路径
devkit2007                      = voc2007_devkit();
devkit2012                      = voc2012_devkit();
% 将数据读入到imdb和roidb文件
switch usage
    case {\'train\'}
        dataset.imdb_train    = {  imdb_from_voc(devkit2007, \'trainval\', \'2007\', use_flip), ...
                                    imdb_from_voc(devkit2012, \'trainval\', \'2012\', use_flip)};
        dataset.roidb_train   = cellfun(@(x) x.roidb_func(x), dataset.imdb_train, \'UniformOutput\', false);
    case {\'test\'}
        error(\'only supports one source test currently\');  
    otherwise
        error(\'usage = \'\'train\'\' or \'\'test\'\'\');
end

imdb文件是一个matlab的表结构,表的每一行是一幅图像,分别包含如下信息:图像的路径,编号,大小,groundtruth(位置及类标)等。

conf_proposal参数:

只关注RPN的conf_proposal

% code from $FASTERRCNN/experiments/script_faster_rcnn_VOC0712_VGG16.m
%

% conf
conf_proposal               = proposal_config(\'image_means\', model.mean_image, \'feat_stride\', model.feat_stride);

 

RPN所需要的参数。其中值得注意的参数有 
batch_size:[256]每幅图像中筛选使用的bg样本和fg样本的总个数 
fg_fraction:[0.5]batch_size中fg样本的比例,如果fg样本个数不足,则添加bg样本 
drop_boxes_runoff_image:[1]在训练阶段是否去掉超出图像边界的anchors 
bg_thresh_hi:[0.3]被看做反例样本的anchor与groundtruth的最大IoU 
bg_thresh_lo:[0]被看做反例样本的anchor与groundtruth的最小IoU 
fg_thresh:[0.7]被看做正例样本的anchor与groundtruth的最小IoU 
ims_per_batch:[1]训练时每次输入的图像个数,当前只支持每次输入一幅图像 
scale:[600]短边缩放后最小值 
max_size:[1000]长边缩放后最大值 
feat_stride:[16]VGG中conv5_3相比于输入图像缩小了16倍,也就是相邻两个点之间的stride=16 
anchors:不同长宽比和尺度的9个基本anchors 
output_width_map:输入图像的宽度和conv5_3宽度的对应关系 
output_height_map:输入图像的高度和conv5_3高度的对应关系 
bg_weight:[1]计算损失时每个反例样本的权值,正例样本权值全为1 
image_means: 图像均值

具体配置文件为:

% code from $FASTERRCNN/functions/rpn/proposal_config.m
%

function conf = proposal_config(varargin)
% conf = proposal_config(varargin)
% --------------------------------------------------------
% Faster R-CNN
% Copyright (c) 2015, Shaoqing Ren
% Licensed under The MIT License [see LICENSE for details]
% --------------------------------------------------------

    ip = inputParser;

    %% training
    ip.addParamValue(\'use_gpu\',         gpuDeviceCount > 0, ...            
                                                        @islogical);

    % whether drop the anchors that has edges outside of the image boundary
    ip.addParamValue(\'drop_boxes_runoff_image\', ...
                                        true,           @islogical);

    % Image scales -- the short edge of input image                                                                                                
    ip.addParamValue(\'scales\',          600,            @ismatrix);
    % Max pixel size of a scaled input image
    ip.addParamValue(\'max_size\',        1000,           @isscalar);
    % Images per batch, only supports ims_per_batch = 1 currently
    ip.addParamValue(\'ims_per_batch\',   1,              @isscalar);
    % Minibatch size
    ip.addParamValue(\'batch_size\',      256,            @isscalar);
    % Fraction of minibatch that is foreground labeled (class > 0)
    ip.addParamValue(\'fg_fraction\',     0.5,           @isscalar);
    % weight of background samples, when weight of foreground samples is
    % 1.0
    ip.addParamValue(\'bg_weight\',       1.0,            @isscalar);
    % Overlap threshold for a ROI to be considered foreground (if >= fg_thresh)
    ip.addParamValue(\'fg_thresh\',       0.7,            @isscalar);
    % Overlap threshold for a ROI to be considered background (class = 0 if
    % overlap in [bg_thresh_lo, bg_thresh_hi))
    ip.addParamValue(\'bg_thresh_hi\',    0.3,            @isscalar);
    ip.addParamValue(\'bg_thresh_lo\',    0,              @isscalar);
    % mean image, in RGB order
    ip.addParamValue(\'image_means\',     128,            @ismatrix);
    % Use horizontally-flipped images during training?
    ip.addParamValue(\'use_flipped\',     true,           @islogical);
    % Stride in input image pixels at ROI pooling level (network specific)
    % 16 is true for {Alex,Caffe}Net, VGG_CNN_M_1024, and VGG16
    ip.addParamValue(\'feat_stride\',     16,             @isscalar);
    % train proposal target only to labled ground-truths or also include
    % other proposal results (selective search, etc.)
    ip.addParamValue(\'target_only_gt\',  true,           @islogical);

    % random seed                    
    ip.addParamValue(\'rng_seed\',        6,              @isscalar);


    %% testing
    ip.addParamValue(\'test_scales\',     600,            @isscalar);
    ip.addParamValue(\'test_max_size\',   1000,           @isscalar);
    ip.addParamValue(\'test_nms\',        0.3,            @isscalar);
    ip.addParamValue(\'test_binary\',     false,          @islogical);
    ip.addParamValue(\'test_min_box_size\',16,            @isscalar);
    ip.addParamValue(\'test_drop_boxes_runoff_image\', ...
                                        false,          @islogical);

    ip.parse(varargin{:});
    conf = ip.Results;

    assert(conf.ims_per_batch == 1, \'currently rpn only supports ims_per_batch == 1\');

    % if image_means is a file, load it
    if ischar(conf.image_means)
        s = load(conf.image_means);
        s_fieldnames = fieldnames(s);
        assert(length(s_fieldnames) == 1);
        conf.image_means = s.(s_fieldnames{1});
    end
end

 

2 产生anchor

% code from $FASTERRCNN/experiments/script_faster_rcnn_VOC0712_VGG16.m
%

% generate anchors and pre-calculate output size of rpn network 
[conf_proposal.anchors, conf_proposal.output_width_map, conf_proposal.output_height_map] ...
                            = proposal_prepare_anchors(conf_proposal, model.stage1_rpn.cache_name, model.stage1_rpn.test_net_def_file);

 

proposal_prepare_anchors函数分为两部分。首先产生输入图像大小和conv5_3大小的对应关系map;然后产生9个基本anchors。最后将output_width_mapoutput_height_map以及anchors存入conf_proposal参数中。

% code from $FASTERRCNN/experiments/script_faster_rcnn_VOC0712_VGG16.m
%

function [anchors, output_width_map, output_height_map] = proposal_prepare_anchors(conf, cache_name, test_net_def_file)
    %产生输入图像大小和conv5_3大小的对应关系
    [output_width_map, output_height_map] ...                           
                                = proposal_calc_output_size(conf, test_net_def_file);
    %产生9个基本anchors
    anchors                = proposal_generate_anchors(cache_name, ...
                                    \'scales\',  2.^[3:5]);
end

 

1 输入图像大小和conv5_3大小的对应关系

首先初始化RPN的测试网络;然后产生不同长宽的全零图像并进行前向传播;记录每个输入图像大小对应的conv5_3大小;重置caffe。

% code from $FASTERRCNN/functions/rpn/proposal_calc_output_size.m
%

% 初始化RPN的测试网络
caffe_net = caffe.Net(test_net_def_file, \'test\');

% set gpu/cpu
if conf.use_gpu
caffe.set_mode_gpu();
else
caffe.set_mode_cpu();
end

% 产生不同长宽的全零图像并进行前向传播
input = 100:conf.max_size;
output_w = nan(size(input));
output_h = nan(size(input));
for i = 1:length(input)
    s = input(i);
    im_blob = single(zeros(s, s, 3, 1));
    net_inputs = {im_blob};

    % Reshape net\'s input blobs
    caffe_net.reshape_as_input(net_inputs);
    caffe_net.forward(net_inputs);

    % 记录每个输入图像大小对应的conv5_3大小
    cls_score = caffe_net.blobs(\'proposal_cls_score\').get_data();
    output_w(i) = size(cls_score, 1);
   output_h(i) = size(cls_score, 2);
end

output_width_map = containers.Map(input, output_w);
output_height_map = containers.Map(input, output_h);

% 重置caffe
caffe.reset_all(); 

 

2 生成9个基准anchors

设置最基准的anchor大小为16×16;保持面积不变,利用该m文件中ratio_jitter生成三个长宽比(0.5,1,2)的anchors,如下图所示;通过该m文件中scale_jitter将不同长宽比的anchors放大到三个尺度(8,16,32)。一共生成9个anchors。 
这里写图片描述

% code from $FASTERRCNN/functions/rpn/proposal_generate_anchors.m
%

%% inputs
    ip = inputParser;
    ip.addRequired(\'cache_name\',                        @isstr);

    % the size of the base anchor 
    ip.addParamValue(\'base_size\',       16,             @isscalar);
    % ratio list of anchors
    ip.addParamValue(\'ratios\',          [0.5, 1, 2],    @ismatrix);
    % scale list of anchors
    ip.addParamValue(\'scales\',          2.^[3:5],       @ismatrix);    
    ip.addParamValue(\'ignore_cache\',    false,          @islogical);
    ip.parse(cache_name, varargin{:});
    opts = ip.Results;

%%
    if ~opts.ignore_cache
        anchor_cache_dir            = fullfile(pwd, \'output\', \'rpn_cachedir\', cache_name); 
                                      mkdir_if_missing(anchor_cache_dir);
        anchor_cache_file           = fullfile(anchor_cache_dir, \'anchors\');
    end
    try
        ld                      = load(anchor_cache_file);
        anchors                 = ld.anchors;
    catch
        % 设置最基准的anchor大小为$16\times16$
        base_anchor             = [1, 1, opts.base_size, opts.base_size];
        % 保持面积不变,生成不同长宽比的anchors
        ratio_anchors           = ratio_jitter(base_anchor, opts.ratios);
        % 在不同长宽比anchors的基础上进行尺度缩放
        anchors                 = cellfun(@(x) scale_jitter(x, opts.scales), num2cell(ratio_anchors, 2), \'UniformOutput\', false);
        anchors                 = cat(1, anchors{:});
        if ~opts.ignore_cache
            save(anchor_cache_file, \'anchors\');
        end
    end

 

3 训练阶段

所有参数设置完成后开始训练。

% code from $FASTERRCNN/experiments/script_faster_rcnn_VOC0712_VGG16.m
%

%%  stage one proposal
fprintf(\'\n***************\nstage one proposal \n***************\n\');
% train
model.stage1_rpn            = Faster_RCNN_Train.do_proposal_train(conf_proposal, dataset, model.stage1_rpn, opts.do_val);

 

do_proposal_train直接调用$FASTERRCNN/functions/rpn/proposal_train.m文件。 
根据作者注释的流程,$FASTERRCNN/functions/rpn/proposal_train.m主要分为init, making tran/val dataTraining三个阶段

init,初始化

初始化中主要设置缓存文件路径,读入caffe求解参数,读入caffe模型结构,读入预训练模型,初始化日志文件,设置GPU模式。

% code from `$FASTERRCNN/functions/rpn/proposal_train.m`
%

%% init  
    % init caffe solver
    imdbs_name = cell2mat(cellfun(@(x) x.name, imdb_train, \'UniformOutput\', false));
    cache_dir = fullfile(pwd, \'output\', \'rpn_cachedir\', opts.cache_name, imdbs_name);
    mkdir_if_missing(cache_dir);
    caffe_log_file_base = fullfile(cache_dir, \'caffe_log\');
    caffe.init_log(caffe_log_file_base);
    caffe_solver = caffe.Solver(opts.solver_def_file);
    caffe_solver.net.copy_from(opts.net_file);

    % init log
    timestamp = datestr(datevec(now()), \'yyyymmdd_HHMMSS\');
    mkdir_if_missing(fullfile(cache_dir, \'log\'));
    log_file = fullfile(cache_dir, \'log\', [\'train_\', timestamp, \'.txt\']);
    diary(log_file);   

    % set random seed
    prev_rng = seed_rand(conf.rng_seed);
    caffe.set_random_seed(conf.rng_seed);

    % set gpu/cpu
    if conf.use_gpu
        caffe.set_mode_gpu();
    else
        caffe.set_mode_cpu();
    end

    disp(\'conf:\');
    disp(conf);
    disp(\'opts:\');
    disp(opts);

 

2 making tran/val data,将bbs的数据转换为regression的数据
% code from `$FASTERRCNN/functions/rpn/proposal_train.m`
%

%% making tran/val data
    fprintf(\'Preparing training data...\');
    [image_roidb_train, bbox_means, bbox_stds]...
                            = proposal_prepare_image_roidb(conf, opts.imdb_train, opts.roidb_train);
    fprintf(\'Done.\n\');

    if opts.do_val
        fprintf(\'Preparing validation data...\');
        [image_roidb_val]...
                                = proposal_prepare_image_roidb(conf, opts.imdb_val, opts.roidb_val, bbox_means, bbox_stds);
        fprintf(\'Done.\n\');

proposal_prepare_image_roidb.m从imdb以及roidb中读入图像信息后,实现了:图像中bbx的groundtruth数据由[x1,y1,x2,y2]转换为[dx,dy,dw,dh],由faster-RCNN论文中的公式(2)实现;然后对bg和fg样本进行筛选;最后计算转换后的[dx,dy,dw,dh]均值和方差。

Step1: 从imdb以及roidb中读入图像信息

% code from `$FASTERRCNN/functions/rpn/proposal_prepare_image_roidb.m`
%

imdbs = imdbs(:);
    roidbs = roidbs(:);

    if conf.target_only_gt
        image_roidb = ...
            cellfun(@(x, y) ... // @(imdbs, roidbs)
                arrayfun(@(z) ... //@([1:length(x.image_ids)])
                    struct(\'image_path\', x.image_at(z), \'image_id\', x.image_ids{z}, \'im_size\', x.sizes(z, :), \'imdb_name\', x.name, \'num_classes\', x.num_classes, ...
                    \'boxes\', y.rois(z).boxes(y.rois(z).gt, :), \'class\', y.rois(z).class(y.rois(z).gt, :), \'image\', [], \'bbox_targets\', []), ...
                [1:length(x.image_ids)]\', \'UniformOutput\', true),...
            imdbs, roidbs, \'UniformOutput\', false);
    else
        image_roidb = ...
            cellfun(@(x, y) ... // @(imdbs, roidbs)
                arrayfun(@(z) ... //@([1:length(x.image_ids)])
                    struct(\'image_path\', x.image_at(z), \'image_id\', x.image_ids{z}, \'im_size\', x.sizes(z, :), \'imdb_name\', x.name, ...
                    \'boxes\', y.rois(z).boxes, \'class\', y.rois(z).class, \'image\', [], \'bbox_targets\', []), ...
                [1:length(x.image_ids)]\', \'UniformOutput\', true),...
            imdbs, roidbs, \'UniformOutput\', false);
    end

    image_roidb = cat(1, image_roidb{:});

Step2: bbx的groundtruth转换

% code from `$FASTERRCNN/functions/rpn/proposal_prepare_image_roidb.m`
%
% enhance roidb to contain bounding-box regression targets
    [image_roidb, bbox_means, bbox_stds] = append_bbox_regression_targets(conf, image_roidb, bbox_means, bbox_stds);

 

proposal_prepare_image_roidb.m,详细步骤为: 
读入图像信息:将图像信息读入到image_roidb中。 
groundtruth数据转换:proposal_prepare_image_roidb.m中的append_bbox_regression_targets实现 
获得所有anchors:通过proposal_locate_anchors.m获得图像的所有anchors以及图像需要缩放的比例 
图像缩放比例:通过scalemax_size获得图像的缩放比例并记录缩放后图像大小 
图像的最短边最小值为scale,最长边最大值为max_size

    - **conv5_3特征层大小:**通过查表法获得缩放后图像对应的conv5_3的大小(output_width_map,output_height_map)
    - **网格化:**按照`feat_stride`将conv5_3的大小打成网格
    - **所有anchors:**在网格每个节点上放入9个基本`anchors`,并获得其坐标。
- **挑选样本:**`proposal_prepare_image_roidb.m`文件中的`compute_targets`实现正例样本和反例样本的选取
    - **计算overlap**:所有anchors存入变量`ex_rois`,计算每个anchor和每个groundtruth的重叠率(IoU)
    - **去掉超出范围的anchor**:将超出范围的anchor和groundtruth的重叠率置0.
    - **筛选正例样本**:IoU最大的和IoU大于`fg_thresh`的anchor作为正例样本
    - **筛选反例样本**:IoU介于`bg_thresh_hi`和`bg_thresh_lo`之间的作为反例样本
    - **计算回归量**:通过文章中公式(2)计算每个正例样本的回归量`dx`,`dy`,`dw`,`dh`
    - **新的groundtruth**:将正例样本的回归量作为正例样本的groundtruth(类标1),反例样本的回归量均设为0(类标-1)。
- **计算均值方差**:计所有正例样本的回归量的均值和方差,并且标准化(减去均值,除以方差)

 

Training,训练

Step1: 打乱训练数据顺序 
proposal_train.m中的generate_random_minibatch函数实现对训练数据的打乱,并返回打乱后的第一幅图像的标号sub_db_inds

Step2: 准备一个训练数据 
proposal_generate_minibatch.m实现。 
正反例样本选取及权重设置:proposal_generate_minibatch.m中的sample_rois选取样本并且设置权重 
fg_inds:正例样本序号,如果不到batch_sizefg_fraction倍,则用反例样本补足。 
bg_inds:反例样本序号,反例样本一般都比较多,需要进行随机选取。 
label:对每个正例样本label置1,反例样本label置0. 
label_weights:样本类别损失的权重。正例样本置1,反例样本置bg_weight。 
bbox_targets:进行数据转换后的正反例样本窗口位置 
bbox_loss_weights:样本位置损失的权重。正例为1,反例为0

  • 整合RPN输入blob 
    • **RPN输入的im_blob:**im_blob
    • **RPN输入的labels_blob:**labels_blob
    • **RPN输入的label_weights_blob:**label_weights_blob
    • **RPN输入的bbox_targets_blob:**bbox_targets_blob
    • **RPN输入的bbox_loss_blob:**bbox_loss_blob

Step3: 迭代