详解线性分类-线性判别分析(Fisher)-模型定义【白板推导系列笔记】

时间:2022-10-08 12:10:58

线性判别分析的思想是,找的一个方向$\omega$,将样本向这个方向做投影,投影后的数据尽可能的满足

  1. 相同类内部的样本的投影尽可能接近
  2. 不同类之间的距离尽可能较大

总结为类内小,类间大

$$ \begin{gathered} X=\begin{pmatrix} x_{1} & x_{2} & \cdots & x_{N} \end{pmatrix}^{T}=\begin{pmatrix} x_{1}^{T} \ x_{2}^{T} \ \vdots \ x_{N}^{T} \end{pmatrix}{N \times p},Y=\begin{pmatrix} y{1} \ y_{2} \ \vdots \ y_{N} \end{pmatrix}{N \times 1}\ \left{(x{i},y_{i})\right}{i=1}^{N},x{i}\in \mathbb{R}^{p},y_{i}\in \left{+1,-1\right}\ x_{C_{1}}=\left{x_{i}|y_{i}=+1\right},x_{C_{2}}=\left{x_{i}|y_{i}=-1\right}\ |x_{C_{1}}|=N_{1},|x_{C_{2}}|=N_{2},N_{1}+N_{2}=N \end{gathered} $$ 设 $$ z_{i}=\omega^{T}x_{i} $$ 显然这是个实数,可以看做$x_{i}$在$\omega$上的投影 模型要求类内小,可以用方差矩阵来衡量类内样本的聚散程度 $$ \begin{aligned} \bar{z}&=\frac{1}{N}\sum\limits_{i=1}^{N}z_{i}=\frac{1}{N}\sum\limits_{i=1}^{N}\omega^{T}x_{i}\ C_{1}:\bar{z_{1}}&=\frac{1}{N_{1}}\sum\limits_{i=1}^{N_{1}}\omega^{T}x_{i}\ S_{1}&=\frac{1}{N_{1}}\sum\limits_{i=1}^{N_{1}}(\omega^{T}x_{i}- \bar{z_{1}})(\omega^{T}x_{i}-\bar{z_{1}})^{T}\ &=\frac{1}{N_{1}}\sum\limits_{i=1}^{N_{1}}(\omega^{T}x_{i}- \frac{1}{N_{1}}\sum\limits_{j=1}^{N_{1}}\omega^{T}x_{j})(\omega^{T}x_{i}- \frac{1}{N_{1}}\sum\limits_{j=1}^{N_{1}}\omega^{T}x_{j})^{T}\ &这里定义\frac{1}{N_{1}}\sum\limits_{j=1}^{N_{1}}x_{j}=\overline{x_{C_{1}}}\ &=\frac{1}{N_{1}}\sum\limits_{i=1}^{N_{1}}\omega^{T}(x_{i}-\overline{x_{C_{1}}})(x_{i}-\overline{x_{C_{1}}})^{T}\omega\ &=\omega^{T}\left(\frac{1}{N_{1}}\sum\limits_{i=1}^{N_{1}}(x_{i}-\overline{x_{C_{1}}})(x_{i}-\overline{x_{C_{1}}})^{T}\right)\omega\ &这里定义\frac{1}{N_{1}}\sum\limits_{i=1}^{N_{1}}(x_{i}-\overline{x_{C_{1}}})(x_{i}-\overline{x_{C_{1}}})^{T}=S_{C_{1}}\ &=\omega^{T}S_{C_{1}}\omega\ C_{2}:\bar{z_{2}}&=\frac{1}{N_{2}}\sum\limits_{i=1}^{N_{2}}\omega^{T}x_{i}\ S_{2}&=\omega^{T}S_{C_{2}}\omega \end{aligned} $$ 因此类内可以用方差的和衡量,即 $$ S_{1}+S_{2}=\omega^{T}(S_{C_{1}}+S_{C_{2}})\omega $$

注意这里下标为$1,2$的是投影$z$的相关数字特征,下表为$C_{1},C_{2}$的是$x$的相关数字特征

对于不同类之间的距离可以用不同类的均值差的平法来衡量,即 $$ \begin{aligned} (\bar{z_{1}}-\bar{z_{2}})^{2}&=\left(\frac{1}{N_{1}}\sum\limits_{i=1}^{N_{1}}\omega^{T}x_{i}-\frac{1}{N_{2}}\sum\limits_{i=1}^{N_{2}}\omega^{T}x_{i}\right)^{2}\ &=\omega^{T}\left(\frac{1}{N_{1}}\sum\limits_{i=1}^{N_{1}}x_{i}-\frac{1}{N_{2}}\sum\limits_{i=1}^{N_{2}}x_{i}\right)^{2}\ &=[\omega^{T}(\bar{x_{C_{1}}}-\bar{x_{C_{2}}})]^{2}\ &=\omega^{T}(\bar{x_{C_{1}}}-\bar{x_{C_{2}}})(\bar{x_{C_{1}}}-\bar{x_{C_{2}}})^{T}\omega \end{aligned} $$ 要取最优的$\hat{\omega}$就要求$S_{1}+S_{2}$小,$(\bar{z_{1}}-\bar{z_{2}})^{2}$大,因此定义 $$ \begin{aligned} J(\omega)&=\frac{\omega^{T}(\bar{x_{C_{1}}}-\bar{x_{C_{2}}})(\bar{x_{C_{1}}}-\bar{x_{C_{2}}})^{T}\omega}{\omega^{T}(S_{C_{1}}+S_{C_{2}})\omega} \end{aligned} $$ 因此对于$\hat{\omega}$,有 $$ \begin{aligned} \hat{\omega}&=\mathop{argmax\space}\limits_{\omega}J(\omega)\ &=\mathop{argmax\space}\limits_{\omega}\frac{\omega^{T}(\bar{x_{C_{1}}}-\bar{x_{C_{2}}})(\bar{x_{C_{1}}}-\bar{x_{C_{2}}})^{T}\omega}{\omega^{T}(S_{C_{1}}+S_{C_{2}})\omega}\ &定义S_{b}=(\bar{x_{C_{1}}}-\bar{x_{C_{2}}})(\bar{x_{C_{1}}}-\bar{x_{C_{2}}})^{T}(between-class类间方差)\ &定义S_{\omega}=S_{C_{1}}+S_{C_{2}}(with-class类内方差)\ &=\frac{\omega^{T}S_{b}\omega}{\omega^{T}S_{\omega}\omega}\ &=\omega^{T}S_{b}\omega(\omega^{T}S_{\omega}\omega)^{-1}\ \frac{\partial J(\omega)}{\partial \omega}&=2S_{b}\omega(\omega^{T}S_{\omega}\omega)^{-1}+\omega^{T}S_{b}\omega \cdot (-1)(\omega^{T}S_{w} \omega)^{-2}\cdot 2S_{\omega}\omega\ 0&=2S_{b}\omega(\omega^{T}S_{\omega}\omega)^{-1}+\omega^{T}S_{b}\omega \cdot (-1)(\omega^{T}S_{w} \omega)^{-2}\cdot 2S_{\omega}\omega\ 0&=S_{b}\omega(\omega^{T}S_{\omega}\omega)-\omega^{T}S_{b}\omega S_{\omega}\omega\ (\omega^{T}S_{b}\omega) S_{\omega}\omega&=S_{b}\omega(\omega^{T}S_{\omega}\omega)\ &这里显然\omega^{T}S_{b}\omega,\omega^{T}S_{\omega}\omega \in \mathbb{R}\ \omega&=\frac{\omega^{T}S_{\omega}\omega}{\omega^{T}S_{b}\omega}S_{\omega}^{-1}S_{b}\omega\ &这里如果只关系\omega的方向,则可以忽略所有实数\ \omega &\propto S_{\omega}^{-1}S_{b}\omega\ & \propto S_{\omega}^{-1}(\bar{x_{C_{1}}}-\bar{x_{C_{2}}})(\bar{x_{C_{1}}}-\bar{x_{C_{2}}})^{T}\omega\ &这里显然(\bar{x_{C_{1}}}-\bar{x_{C_{2}}})^{T}\omega也是实数\ &\propto S_{\omega}^{-1}(\bar{x_{C_{1}}}-\bar{x_{C_{2}}}) \end{aligned} $$ 其实我们只是要$\mathop{argmax\space}\limits_{\omega}J(\omega)$,但实际上,我们只要$\omega$的方向,并不关系$\omega$的值,因此此处$\propto S_{\omega}^{-1}(\bar{x_{C_{1}}}-\bar{x_{C_{2}}})$记为所求的$\omega$方向