Deep learning From Image to Sequence

本文笔记旨在概括地讲deep learning的经典应用。内容太大，分三块。

---------------------------------------------------------------------------------------------

Content

1. 回想 deep learning在图像上的经典应用

1.1 Autoencoder

1.2 MLP

1.3 CNN<具体的见上一篇CNN>

2. deep learning处理语音等时序信号

2.1 对什么时序信号解决什么问题

2.2 准备知识

2.2.1 Hidden Markov Model(HMM)

2.2.2 GMM-HMM for Speech Recognition

2.2.3 Restricted Boltzmann Machine（RBM）

3. DBN 和 RNN 在语音上的应用

3.1 DBN

3.1.1 DBN架构

3.1.2 DBN-DNN for Speech Recognition

3.2 RNN

3.2.1 RNN种类

3.2.2 RNN-RBM for Sequential signal Prediction

---------------------------------------------------------------------------------------------

1. 回想 deep learning处理图像等非时序信号 <具体的见上一篇CNN>

----------------------------------------------

1.1 AutoEncoder（unsupervised）

Deep learning From Image to Sequence

watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvYWJjamVubmlmZXI=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/SouthEast" width="750" height="405" alt="" />

扩展：Stack AutoEncoder（能够变成supervised），见Andrew Ng的UFLDL教程。我就不贴图了

----------------------------------------------

1.2 MLP

MLP（ANN）是最naive的神网分类器。一个hidden层，连两端nonlinear function，output输出为f(x)，softmax做分类。

Deep learning From Image to Sequence

----------------------------------------------

1.3 Convolutional Neural Network

特点：1. 非全连接，2、共享权重

做法：1. 卷积 2. 降採样（pooling）

具体见上一篇CNN

Deep learning From Image to Sequence

---------------------------------------------------------------------------------------------

2. deep learning处理语音等时序信号

2.1 对什么时序信号解决什么问题：

handwriting recognition
speech recognition
music composition
protein analysis
stock market prediction
...

2.2 准备知识：

----------------------------------------------

2.2.1 Hidden Markov Model(HMM) - 带unobserved（这就是所谓hidden）states的随机过程。表示输入语音信号和hidden state（因素）的模型：

Deep learning From Image to Sequence

watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvYWJjamVubmlmZXI=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/SouthEast" alt="" />

训练HMM模型：给定一个时序y1...yT, 用MLE（typically EM implemented，具体见这篇第三部分training）预计參数；

----------------------------------------------

2.2.2 GMM-HMM for Speech Recognition (较大。单独放在一篇blog里了)

----------------------------------------------

2.2.3 Restricted Boltzmann Machine

讲RBM之前要先讲一下生成模型……<How to build a single layer of feature detector>

大体分为两类——directed model & undirected model:

1.directed model （e.g. GMM 从离散分布求latent状态）

依据先验分布选择latent variable的状态

给定latent states，依据条件分布求observable variables的状态

2.undirected model

仅仅用參数W，通过能量函数定义v(visible)和h(hidden latent variables)的联合概率

依据”explaining away”，假设latent和visible变量有着非线性关系。directed model非常难判断出latent variable的状态；但在undirected model中，仅仅要latent变量间没有变项链就能够轻松判断。

PS: explaining away是什么？

state的先验相互独立，后验也相互独立，

以下再讲RBM。

RBM 是马尔科夫随机场（MRF）的一种。不同之处：

1. RBM是一个双向连接图（bipartite connectivity graph）

2. RBM在不同unit之间不共享权重

3. 有一部分变量是unobserved

RBM对能量函数E(v,h)的定义：

Deep learning From Image to Sequence

watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvYWJjamVubmlmZXI=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/SouthEast" alt="" />

RBM的參数构成：W(weight), bias_h, bias_v

已知联合分布P(v,h) 。可通过Gibbs採样边缘分布分别得到h,v，依据Gradient of NLL进行梯度下降学习到參数。

RBM的训练目标是：最大化p(v=visible)。

（visible=真实的visible数据）

RBM实际训练过程中，对每一个training_batch：

contrastive divergence 採样k次（gibbs CD-k）

依据cost function进行update : Deep learning From Image to Sequence , 即 cost = T.mean(self.free_energy(self.input)) - T.mean(self.free_energy(chain_end))

上面讲的RBM都是v,h = 0/1的。那怎么处理real-value的呢？

ANS：用Gaussian-Bernoulli RBM (GRBM)。

对上面经典RBM修改不大。仅仅须要改energy function & conditional prob:

Deep learning From Image to Sequence

watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvYWJjamVubmlmZXI=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/SouthEast" alt="" />

3. DBN 和 RNN 在语音上的应用

3.1 DBN

3.1.1 DBN架构

Deep learning From Image to Sequence

watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvYWJjamVubmlmZXI=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/SouthEast" alt="" />

流程：

1. pre-train

从左到右来看，因为输入为real-value，所以第一层为GRBM，训练W1

GRBM训练出来的hidden给下一个RBM做input，训练W2

这个RBM训练出来的hidden再传给下一个RBM做input。训练W3

……（反复）

2. 能够直接把这几层pre-train好的W叠起来，双向weight箭头全改成top-down的。成了一个DBN生成模型

3. 加分类器

能够最后在这个pre-trained网络头部加一个softmax分类器，当中每一个节点表示HMM中一个状态，去做有监督的fine-tuning.。

3.1.2 DBN-DNN for Speech Recognition

假设你细致看过上一篇GMM-HMM for Speech Recognition就会发现，这个模型和GMM-HMM仅仅差在GMM

即。DNN-HMM用DNN（undirected model）取代了GMM（directed model）,这种优点是能够解决h，v之间非线性关系映射。

Deep learning From Image to Sequence

Fig1. GMM-HMM

Deep learning From Image to Sequence

Fig2. DNN-HMM

3.2 RNN

3.2.1 RNN种类

常见的：

1.Fully Recurrent Network

2.Hopfield Network

3.Elman Network (Simple Recurrent networks)

4.Long short term memory network

Deep learning From Image to Sequence

fig. LSTM

3.2.2 RNN-RBM for Sequential signal Prediction

见一个RNN样例，RNNRBM（RNN-RBM for music composition 网络架构及程序解读）

Reference:

为了大家看的方便，我推荐从简了。

。

抄了太多图，不贴出处了大牛们见谅。。不然一堆推荐无从下手滴样纸

Deep Learning 在语音上的应用DNN经典文章:

1. Hinton, Li Deng, Dong Yu大作：Deep Neural Networks for Acoustic Modeling in Speech Recognition

2. Andrew Ng, NIPS 09, Unsupervised feature learning for audio classiﬁcation using convolutional deep belief networks

Deep Learning 在语音上的应用RNN经典文章:

1. Bengio ICML 2012. RNN+RBM paper有实现（下一篇细讲）

2. Schmidhuber JMLR 2002 paper讲LSTM经典

3. The Use of Recurrent Neural Networks in Continuous Speech Recognition,

doi=10.1.1.65.749&rep=rep1&type=pdf">老文章讲RNN比較基础

,可是确实经典

秒客网

Deep learning From Image to Sequence

相关文章