Deep Learning Face Representation by Joint Identification-Verification

目前人脸识别算法效果比较好的是 Deep Learning Face Representation by Joint Identification-Verification。

abstraction

人脸识别（简称 fr）关键问题要找到人脸的有效特征，以减少 intra-personal variations（简称 intravar），增加 inter-personal variations（简称 intervar）。利用 face-identfication（简称 fid）face-verification（简称 fvr），可以有效解决问题。The Deep IDentification-verification features（简称 deepid2）通过 cnn 学习。

introduction

LDA、贝叶斯、unified subspace 等方法存在线性的局限性，这与人脸的特性不符合。作者认为 fid 和 fvr 是学习的两个必要信号。fid 是指预测输入属于某个分类，fvr 是预测多个输入是否属于同一分类。deepid2 从隐藏层顶部提取，通过函数 $ g(DeepID2) $ 映射到个体。fid 信号将个体区分开，增大 intervar。但这导致推广性不好。fvr 信号可以有效减少 intravar，令同个体的输入图像提取特征尽量相近。两者搭配食用效果更好。
此外，在不同区域、不同分辨率提取特征，经过 PCA 降维，形成最终特征。通过 D. Chen, X. Cao, L. Wang, F. Wen, and J. Sun. Bayesian face revisited: A joint formulation. In Proc. ECCV, 2012 所提出 fvr 模型，在 LFW 上达到了 99.15% 的准确率。

identification-verification guided deep feature learning

通过 ConvNet Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998 学习图像特征。包含 4 层 conv，前三层后跟随 max-pooling。（TODO：To learn a diverse number of high-level features, we do not require weight-sharing on the entire feature map in higher convolutional layers）。在第三层，神经权重在 $2 \times 2$ 局部区域共享。ConvNet 提取 160d deepid2 向量。DeepID2 层对 3、4 层 ConvLayer 全连接（这个比较特殊）。因为 4 convlayer 比 3 convlayer 更加 global，更加 general，所以 input 不是同一个 scale 的，这称为 multiscale convnet（科学算命法高级忽悠技能）。在 convlayers 和 deepid2layer 使用 relu 作为激励函数，relu 比 sigmoid 还能 fit（而且更加快速）。结构如图： Deep Learning Face Representation by Joint Identification-Verification
RGB 输入 $55 \times 47$。deepid2 提取过程定义为函数 $f = Conv(x, \theta_c)$，$f$ 是deepid2 特征向量，$\theta_c$ 是 ConvNet 要学习的参数。

deepid2 特征在 fid 和 fvr 两个信号监督下学习。fid 信号将输入图像归类为 $n = 8192$ 个不同个体，在 deepid2 层后加装 softmax 层，输出 $n$ 个类型的概率分布。代价函数是 cross-entropy loss：

\[ Ident(f,t,\theta_{id}) = - \sum_{i=1}^{n} - p_i \log{ \hat{p}_i } \]

\[ Ident(f,t,\theta_{id}) = - \log{ \hat{p}_t } \]

这里，$f$ 是 deepid2 特征，$t$ 是目标类型，$\theta_{id}$ 是 softmax 层参数。$p_i$ 是 ground-truth table（target probability distribution），$hat{p}_i$ 是预测结果（predicted probability distribution）。fvr 要减小同一个体间的差异，常用约束包括 L1/L2 norm 和 cosine 相似度。使用下式表示损失函数：
Deep Learning Face Representation by Joint Identification-Verification

其中，$f_i$ $f_j$ 分别表示提取 deepid2 特征，$y_{ij} = 1$ 表示两者同一个体，反之$y_{ij} = -1$。这要求同一个体距离尽量小，不同个体距离尽量大，大于 $m$ 即可。L1 norm 类似。Cosine similarity 定义为：

\[ Verif(f_i, f_j, y_{i,j}, \theta_{ve} = \frac{1}{2} (y_{ij} - \sigma (wd + b)^2 \]

，其中 $ d = \frac {f_i \dot f_j} { \parallel f_i \parallel_2 \parallel f_j \parallel_2 }$，$ \theta_{ve} = { w, b } $ 是学习参数，$\sigma$ 是 sigmoid 函数。

我们的学习目标是 $\theta_c$， $\theta_{ve}$ 和 $\theta_{id}$ 只是为了放大训练过程中 fid 和 fvr 而引入。参数通过 SGD 实现梯度下降：
Deep Learning Face Representation by Joint Identification-Verification

$\theta_{ve}$ 和 $\theta_{id}$ 乘以 $\lambda$ 做权重，训练 $m$ 使得其作为阈值而分类错误率最低。

秒客网

Deep Learning Face Representation by Joint Identification-Verification

abstraction

introduction

identification-verification guided deep feature learning

Face verification

相关文章