目标检测——yolov4模型搭建

时间:2024-03-10 13:30:31

yolov4的网络模型主要分为4个部分

1. 主干特征提取网络,CSPDarkent53

相比 yolov3的Darknet53, yolov4的CSPDarknet53网络有如下特点

1.1 Msih激活函数

Mish = x * K.tanh(K.softplus(x))  其中:softplus = ln(1 + e^x)

Mish激活函数在输入是负值的时候并不是完全截断,允许负梯度的流入,保证了信息的流动,另外Mish函数也保证了每一点的平滑,从而使得梯度下降效果比Relu要好

1.2 CSPnet结构

resblock_body的结构进行修改,采用CSPNet结构,即增加一个大的残差边将输入数据和最后的输出数据进行堆叠(concatenate)

 

 分别取CSPDarknet53的最后三个特征层(8倍下采样、16倍下采样、32倍下采样),作为提取的特征输出

def darknet_body(x):
    x = DarknetConv2D_BN_Mish(32, (3,3))(x)
    # print(x.shape)
    x = resblock_body(x, 64, 1, False)
    # print(x.shape)
    x = resblock_body(x, 128, 2)
    # print(x.shape)
    x = resblock_body(x, 256, 8)
    feat1 = x
    x = resblock_body(x, 512, 8)
    feat2 = x
    x = resblock_body(x, 1024, 4)
    feat3 = x

    return feat1, feat2, feat3

2. 特征金字塔

2.1使用了SPP结构

 

最大池化的strides越大,代表更关注全局信息,采用不同strides对输入进行最大池化处理,然后通过concatenate,可以很好的融合全局信息和局部信息 

2.2使用了PANet结构

 

 将深层特征信息通过上采样取浅层特征融合,其中上采样采用(Upsample2D)插值方式,即resize到目标大小

特征融合采用concatenate方式,

将浅层特征信息通过下采样的方式与深层特征融合,其中下采样采样我们常用的卷积方式,strides=2,

特征融合采用concatenate方式,

最终输出三组特征,分别用于检测大目标、中目标、小目标,其维度分别为(13,13,3*(5+num_classes)),(26,26,3*(5+num_classes)),(52,52,3*(5+num_classes))

def yolo_body(inputs, num_anchors, num_classes):
    # 生成darknet53的主干模型
    feat1, feat2, feat3 = darknet_body(inputs)

    # 第一个特征层
    # y1=(batch_size,13,13,3,85)
    P5 = DarknetConv2D_BN_Leaky(512, (1, 1))(feat3)
    P5 = DarknetConv2D_BN_Leaky(1024, (3, 3))(P5)
    P5 = DarknetConv2D_BN_Leaky(512, (1, 1))(P5)
    # 使用了SPP结构,即不同尺度的最大池化后堆叠。
    maxpool1 = MaxPooling2D(pool_size=(13, 13), strides=(1, 1), padding=\'same\')(P5)
    maxpool2 = MaxPooling2D(pool_size=(9, 9), strides=(1, 1), padding=\'same\')(P5)
    maxpool3 = MaxPooling2D(pool_size=(5, 5), strides=(1, 1), padding=\'same\')(P5)
    P5 = Concatenate()([maxpool1, maxpool2, maxpool3, P5])
    P5 = DarknetConv2D_BN_Leaky(512, (1, 1))(P5)
    P5 = DarknetConv2D_BN_Leaky(1024, (3, 3))(P5)
    P5 = DarknetConv2D_BN_Leaky(512, (1, 1))(P5)

    P5_upsample = compose(DarknetConv2D_BN_Leaky(256, (1, 1)), UpSampling2D(2))(P5)

    P4 = DarknetConv2D_BN_Leaky(256, (1, 1))(feat2)
    P4 = Concatenate()([P4, P5_upsample])
    P4 = make_five_convs(P4, 256)

    P4_upsample = compose(DarknetConv2D_BN_Leaky(128, (1, 1)), UpSampling2D(2))(P4)

    P3 = DarknetConv2D_BN_Leaky(128, (1, 1))(feat1)
    P3 = Concatenate()([P3, P4_upsample])
    P3 = make_five_convs(P3, 128)

    P3_output = DarknetConv2D_BN_Leaky(256, (3, 3))(P3)
    P3_output = DarknetConv2D(num_anchors * (num_classes + 5), (1, 1))(P3_output)

    # 38x38 output
    P3_downsample = ZeroPadding2D(((1, 0), (1, 0)))(P3)
    P3_downsample = DarknetConv2D_BN_Leaky(256, (3, 3), strides=(2, 2))(P3_downsample)
    P4 = Concatenate()([P3_downsample, P4])
    P4 = make_five_convs(P4, 256)

    P4_output = DarknetConv2D_BN_Leaky(512, (3, 3))(P4)
    P4_output = DarknetConv2D(num_anchors * (num_classes + 5), (1, 1))(P4_output)

    # 19x19 output
    P4_downsample = ZeroPadding2D(((1, 0), (1, 0)))(P4)
    P4_downsample = DarknetConv2D_BN_Leaky(512, (3, 3), strides=(2, 2))(P4_downsample)
    P5 = Concatenate()([P4_downsample, P5])
    P5 = make_five_convs(P5, 512)

    P5_output = DarknetConv2D_BN_Leaky(1024, (3, 3))(P5)
    P5_output = DarknetConv2D(num_anchors * (num_classes + 5), (1, 1))(P5_output)

    return Model(inputs, [P5_output, P4_output, P3_output])

 

3. yolo_head

利用yolo_head对提取的特征进行预测

特征层的预测结果对应着三个预测框的位置,我们先将其reshape一下,以voc数据集为例,其结果为(N,13,13,3,25),(N,26,26,3,25),(N,52,52,3,25)。

feats = K.reshape(feats, [-1, grid_shape[0], grid_shape[1], num_anchors, num_classes+5])

最后一个维度中的25包含了4+1+20,分别代表x_offset、y_offset、h和w、置信度、分类结果。

yolov4的解码过程就是将每个网格点加上它对应的x_offset和y_offset,加完后的结果就是预测框的中心,然后再利用先验框和h、w结合 计算出预测框的长和宽。

网格点:

grid_shape = K.shape(feats)[1:3]   #(height, width)
grid_y = K.tile(K.reshape(K.arange(0, stop=grid_shape[0]), [-1, 1, 1, 1]), [1, grid_shape[1], 1, 1])
grid_x = K.tile(K.reshape(K.arange(0, stop=grid_shape[1]), [1, -1, 1, 1]), [grid_shape[0], 1, 1, 1])
grid = K.concatenate([grid_x, grid_y])
grid = K.cast(grid, K.dtype(feats))

x_offset和y_offset:

K.sigmoid(feats[..., :2])

预测框的中心:

 (K.sigmoid(feats[..., :2]) + grid)

h、w:

 K.exp(feats[..., 2:4]) * anchors_tensor
def yolo_head(feats, anchors, num_classes, input_shape, calc_loss=False):
    \'\'\'

    :param feats: (b, 13, 13, 3*25)
    :param anchors: ([[142, 110],  [192, 243],  [459, 401]])
    :param num_classes: 20
    :param input_shape: (416,416)
    :param calc_loss:
    :return:
    \'\'\'
    num_anchors = len(anchors)

    feats = tf.convert_to_tensor(feats)
    anchors_tensor = K.reshape(K.constant(anchors), [1, 1, 1, num_anchors, 2])

    grid_shape = K.shape(feats)[1:3]   #(height, width)
    grid_y = K.tile(K.reshape(K.arange(0, stop=grid_shape[0]), [-1, 1, 1, 1]), [1, grid_shape[1], 1, 1])
    grid_x = K.tile(K.reshape(K.arange(0, stop=grid_shape[1]), [1, -1, 1, 1]), [grid_shape[0], 1, 1, 1])

    grid = K.concatenate([grid_x, grid_y])
    grid = K.cast(grid, K.dtype(feats))

    feats = K.reshape(feats, [-1, grid_shape[0], grid_shape[1], num_anchors, num_classes+5])

    box_xy = (K.sigmoid(feats[..., :2]) + grid) / K.cast(grid_shape[..., ::-1], K.dtype(feats))
    box_wh = K.exp(feats[..., 2:4]) * anchors_tensor / K.cast(input_shape[..., ::-1], K.dtype(feats))

    box_confidence = K.sigmoid(feats[..., 4:5])
    box_class_probs = K.sigmoid(feats[..., 5:])

    if calc_loss == True:
        return grid, feats, box_xy, box_wh

    return box_xy, box_wh, box_confidence, box_class_probs

网络模型的完整过程

 

 

 

参考博客:(13条消息) 睿智的目标检测32——TF2搭建YoloV4目标检测平台(tensorflow2)_Bubbliiiing的学习小课堂-CSDN博客