nnet3中的数据类型

目标与背景

之前的nnet1和nnet2基于Component对象，是一个组件的堆栈。每个组件对应一个神经网络层，为简便起见，将一个仿射变换后接一个非线性表示为一层网络，因此每层网络有两个组件。这些旧组件都有Propagate函数以及Backprop函数，两者都以minibatch为单位进行计算，也包含其他函数。

nnet1和nnet2还支持非前馈神经网络，但实现不同。

在nnet1中，拓扑更为复杂的网络由组件嵌套来表示：ParallelComponent组件内可以包含多个组件序列。此外，在C++底层实现了LSTM组件。

在nnet2中，有了时间索引的概念，支持跨时间拼接。这样可以网络内部使用拼帧来实现TDNN。

nnet3的目标是保留nnet1和nnet2所支持的各种拓扑，还添加更多新的拓扑；并且支持以配置文件的方式表示网络。这样，只需修改编写文件就可以实现一些新的想法，而不需要修改底层代码。

nnet3概要

与nnet1和nnet2的组件序列不同，nnet3将组件以图的形式组合在一起。nnet3的class Nnet包括：

支持的组件列表；
网络图，指示组件的组合方式；

网络图中，用组件的名称来引用组件（这允许某些类型的参数共享）。这样，通过使得时间t的输入取决于时间t-1的输出，就能轻易地实现RNN。使用网络图，还可以处理音频边界问题（比如，RNN不能获取音频第一帧的上文信息，以及最后一帧的下文信息）。

以下是组件和网络图的示例配置文件：

# First the components

component name=affine1 type=NaturalGradientAffineComponent input-dim=48 output-dim=65

component name=relu1 type=RectifiedLinearComponent dim=65

component name=affine2 type=NaturalGradientAffineComponent input-dim=65 output-dim=115

component name=logsoftmax type=LogSoftmaxComponent dim=115

# Next the nodes

input-node name=input dim=12

component-node name=affine1_node component=affine1 input=Append(Offset(input, -1), Offset(input, 0), Offset(input, 1), Offset(input, 2))

component-node name=nonlin1 component=relu1 input=affine1_node

component-node name=affine2 component=affine2 input=nonlin1

component-node name=output_nonlin component=logsoftmax input=affine2

output-node name=output input=output_nonlin

有了输入、请求的输出、网络图与组件，就能构建"计算图"（类ComputationGraph）。计算图是非循环图，其中的节点对应于由特征向量组成的数据矩阵。计算图中的节点对应于网络图中的节点，但还包含了数据的信息：

n，当前minibatch中的第n条语句

t，第n条语句中的第t帧

x，第t帧的第x维（用于卷积神经网络）

为了对上述信息形式化：

定义Index为元组(n, t, x)；

定义Cindex为元组(node-index, Index)；

其中node-index是网络中组件结点component-node的索引。编译图时，创建的实际计算表示为以Cindexes为结点的有向无环图

神经网络计算（训练或解码）的流程如下：

提供ComputationRequest，指示需要输入哪些Cindex（如时间索引）、请求哪些输出；
将ComputationRequest与神经网络一起编译为一系列NnetComputation命令；
为了提高运算效率，编译NnetComputation时，进行了一些优化（可以理解为gcc -O）；
类NnetComputer负责实际的神经网络计算，输入特征矩阵，根据NnetComputation进行计算，最后得到输出矩阵。可以理解为Python的运行时。

nnet3中基础数据结构

Indexes

如上所述，Index是一个元组(n, t, x)，其中n是minibatch中的索引，t是时间索引，x被用于卷积神经网络，通常为零。nnet3的计算是以帧组（chunk）为单位，即一个数据矩阵。其列数为隐层神经元的个数，行数为帧组的大小，并且Index与输入矩阵的行之间有一一对应关系。因此，与始终使用张量的Theano不同，nnet3将多个张量组合为一个矩阵，这样，能够进行BLAS运算优化。

在简单前馈网络的训练中，Index中只有n会变化，因此索引序列为：

[ (0, 0, 0) (1, 0, 0) (2, 0, 0) ... ]

在简单前馈网络的解码中，对于单个语句，Index中只有对应于矩阵行索引的t会变化，因此索引序列为：

[ (0, 0, 0) (0, 1, 0) (0, 2, 0) ... ]

在TDNN的训练中，Index中的n和t都会变化：

[ (0, -1, 0) (0, 0, 0) (0, 1, 0) (1, -1, 0) (1, 0, 0) (1, 1, 0) ... ]

），并且以紧凑形式表示t的范围，上述索引序列变为：

[ (0, -1:1) (1, -1:1) ... ]

Cindexes

Cindex类是一个二元组(int32, Index)，其中int32是网络图中结点（component-node）的索引。根据上文所述，一个神经网络由：

多个组件
由多个结点构成的图（对应于特定的计算）

组成

从上文可知，Indexes与矩阵行相对应。Cindex也是如此，除此之外，还指示Cindex位于哪一个矩阵。

，在结点列表中索引为2，那么组件"component name=affine1"的输出——的矩阵中的某一行。

ComputationGraph

ComputationGraph是由Cindex组成的有向图，其中每个Cindex都有一个Cindex依赖列表。对于简单的前馈网络，ComputationGraph的拓扑为线性的结构，并且：

(nonlin1, (0, 0, 0))的Cindex依赖列表为：(affine1, (0, 0, 0))；

(nonlin1, (1, 0, 0))的Cindex依赖列表为：(affine1, (1, 0, 0))；

以此类推

在ComputationGraph或其他类中，可能会看到名为cindex_id的整型变量，该变量表示网络图中Cindex的索引。

ComputationRequest

ComputationRequest标识一组输入结点和输出结点，每个结点都有一个Indexes关联列表。对于输入节点，它标识了要用于计算的索引；对于输出节点，它标识了哪些索引需要输出。另外，ComputationRequest还包含一些计算配置，如：哪个输出/输入结点提供提供/请求反向传播，以及是否执行模型更新。

例如，ComputationRequest指定名为"input"的输入结点，索引为[(0,-1,0),(0,0,0),(0,1,0)]

指定名"output"的输出节点，请求索引为[(0,0,0)]。

帧，需要输入第-1帧、第0帧以及第1帧。

实际上，以上逐帧的计算示例只会出现在训练时。并且，通常会在minibatch中使用多个语句（样本），因此索引列表的"n"也会有所不同。

神经网络通常可以有多个输入、输出节点；这通常被用于多任务学习或需要多种类型的输入时（例如多视图学习）。

NnetComputation (brief)

Nnet与ComputationRequest编译后，得到的NnetComputation表示特定的计算指令。NnetComputation由一系列的Commands组成，包括：

Propagate；
矩阵复制；
矩阵相加；
矩阵某一行复制到另一个矩阵；
Backprop；
调制矩阵大小；

等

计算对象是矩阵或子矩阵的列表。NnetComputation还包含各种索引（整数数组等），这些索引有时需要作为特定矩阵运算的参数。

将在NnetComputation (detail)中进行详述。

NnetComputer

NnetComputer对象负责实际执行NnetComputation。代码很简单（主要是一个switch语句的循环），因为大部分代码位于NnetComputation的编译与优化。

nnet3中的神经网络

上一节介绍了框架组成。本节将详细介绍神经网络结构、如何将组件组合在一起、如何表示对t-1帧输入的依赖。

Component（基础）

nnet3中的Component，是带有Propagate与Backprop函数的对象。Component还包含一些参数或对固定非线性单元的实现（如Sigmoid组件）。Component接口的重要代码如下：

class Component {

public:

virtual void Propagate(const ComponentPrecomputedIndexes *indexes,

const CuMatrixBase<BaseFloat> &in,

CuMatrixBase<BaseFloat> *out) const = 0;

virtual void Backprop(const std::string &debug_info,

const ComponentPrecomputedIndexes *indexes,

const CuMatrixBase<BaseFloat> &in_value,

const CuMatrixBase<BaseFloat> &out_value,

const CuMatrixBase<BaseFloat> &out_deriv,

Component *to_update, // may be NULL; may be identical

// to "this" or different.

CuMatrixBase<BaseFloat> *in_deriv) const = 0;

...

};

目前。请忽略const ComponentPrecomputedIndexes *indexes参数。

一个特定的Component拥有输入维度和输出维度，并且以行为单位进行维数转换。也就是说，Propagate()的输入矩阵和输出矩阵的行数相同，每处理输入矩阵中一行，就在输出矩阵中创建一行。也就是说，输入矩阵和输出矩阵的Index是相同的。Backprop函数中保留了类似的逻辑。

Components (properties)

Component有一个虚函数Properties()，返回类型为enum ComponentProperties。

class Component

{

...

virtual int32 Properties() const = 0;

...

};

包含的枚举包括：

kUpdatableComponent //是否包含可更新的参数

kPropagateInPlace //其传播函数是否支持就地操作等

许多优化代码都需要这些代码，以便知道程序适用于哪些优化。你还会注意到一个枚举值kSimpleComponent。如果设置了该枚举，则组件是"简单的"，这意味着它按照上面的定义，逐行地进行数据转换。但非简单组件（GeneralCompoent）允许输出矩阵的行数与输出矩阵不同。这样，输出矩阵的Index与输入矩阵Index不同，就需要使用const ComponentPrecomputedIndexes *indexes参数，以显式地指出输入输出使用的Index。

假设本文提到的所有组件都是简单组件，因为它们不是实现任何RNN，LSTM等所必需的。与nnet2框架不同，组件不负责实现诸如splicing跨帧拼接的操作；相反，我们使用Descriptors来处理，这将在下面解释。

神经网络结点

种类型：

enum NodeType { kInput, kDescriptor, kComponent, kDimRange };

kComponent节点是网络的"meat"；

kDescriptor节点是将kComponent组合在一起的"粘合剂"，用于拼帧或循环；

kInput节点非常简单，指示输入的位置与维数

没有kOutput节点是因为输出节点也是kDescriptor。为简便起见，规定：

kComponent节点必须紧接一个kDescriptor节点；
后续没有kComponent的kDescriptor节点被视为输出节点；

Nnet类含有用于区分输入输出节点的函数：

IsOutputNode(int32 node_index)
IsComponentInputNode(int32 node_index)

我们将在下面的神经网络节点（详细信息）中更详细地介绍神经网络节点。

神经网络配置文件

可以从配置文件创建神经网络。以下网络包含一个隐层，并在输入节点处进行拼帧：

# First the components
component name=affine1 type=NaturalGradientAffineComponent input-dim=48 output-dim=65
component name=relu1 type=RectifiedLinearComponent dim=65
component name=affine2 type=NaturalGradientAffineComponent input-dim=65 output-dim=115
component name=logsoftmax type=LogSoftmaxComponent dim=115
# Next the nodes
input-node name=input dim=12
component-node name=affine1_node component=affine1 input=Append(Offset(input, -1), Offset(input, 0), Offset(input, 1), Offset(input, 2))
component-node name=nonlin1 component=relu1 input=affine1_node
component-node name=affine2 component=affine2 input=nonlin1
component-node name=output_nonlin component=logsoftmax input=affine2
output-node name=output input=output_nonlin

配置文件中不存在描述符，取而代之的，使用"input"作为描述符（比如 Input=Append(...)）。配置文件中的每个component-node被展开为两个节点：

kComponent节点
通过"input"定义的kDescriptor节点

以上配置文件并没有给出dim-range节点的示例。dim-range节点的基本格式为：

dim-range-node name=dim-range-node1 input-node=affine1_node dim-offset=0 dim=50

从affine1组件的65维中取前50维。

配置文件中的描述符

类Descriptor是一种非常受限的表达式，用于引用图中其他结点。描述符相当于"粘合剂"，用于将组件连接在一起。描述符负责对组件的输出进行附加操作或求和操作，以便作为后续组件的输入。在本节中，我们从配置文件格式的角度来介绍描述符；下面介绍描述符的语法。

最简单的描述符即是一个结点名本身，例如，"affine1"（只支持kComponent或kInput类型的结点）。下面是关于描述符的语法；

# caution, this is a simplification that overgenerates descriptors.

<descriptor> ::= <node-name> ;; node name of kInput or kComponent node.

<descriptor> ::= Append(<descriptor>, <descriptor> [, <descriptor> ... ] )

<descriptor> ::= Sum(<descriptor>, <descriptor>)

<descriptor> ::= Const(<value>, <dimension>) ;; e.g. Const(1.0, 512)

<descriptor> ::= Scale(<scale>, <descriptor>) ;; e.g. Scale(-1.0, tdnn2)

;; Failover or IfDefined might be useful for time t=-1 in a RNN, for instance.

<descriptor> ::= Failover(<descriptor>, <descriptor>) ;; 1st arg if computable, else 2nd

<descriptor> ::= IfDefined(<descriptor>) ;; the arg if defined, else zero.

<descriptor> ::= Offset(<descriptor>, <t-offset> [, <x-offset> ] ) ;; offsets are integers

;; Switch(...) is intended to be used in clockwork RNNs or similar schemes. It chooses

;; one argument based on the value of t (in the requested Index) modulo the number of

;; arguments

<descriptor> ::= Switch(<descriptor>, <descriptor> [, <descriptor> ...])

;; For use in clockwork RNNs or similar, Round() rounds the time-index t of the

;; requested Index to the next-lowest multiple of the integer <t-modulus>,

;; and evaluates the input argument for the resulting Index.

该描述符用于RNNs，将请求的Index的时间索引t舍入至下一个<t-modulus>的整数倍，并为输出的Index计算input中的参数。

<descriptor> ::= Round(<descriptor>, <t-modulus>) ;; <t-modulus> is an integer

;; ReplaceIndex replaces some <variable-name> (t or x) in the requested Index

;; with a fixed integer <value>. E.g. might be useful when incorporating

;; iVectors; iVector would always have time-index t=0.

<descriptor> ::= ReplaceIndex(<descriptor>, <variable-name>, <value>)

以下的内部实际语法与上面的简化版本不同，因为表达式只能出现在特定的层次结构中。该语法也与实际代码中的类名更紧密地对应。读取描述符的代码尝试以尽可能通用的方式标准化它们，以便几乎所有上述语法都可以读取并转换为内部表示。

;;; <descriptor> == class Descriptor

<descriptor> ::= Append(<sum-descriptor>[, <sum-descriptor> ... ] )

<descriptor> ::= <sum-descriptor> ;; equivalent to Append() with one arg.

;;; <sum-descriptor> == class SumDescriptor

<sum-descriptor> ::= Sum(<sum-descriptor>, <sum-descriptor>)

<sum-descriptor> ::= Failover(<sum-descriptor>, <sum-descriptor>)

<sum-descriptor> ::= IfDefined(<sum-descriptor>)

<sum-descriptor> ::= Const(<value>, <dimension>)

<sum-descriptor> ::= <fwd-descriptor>

;;; <fwd-descriptor> == class ForwardingDescriptor

;; <t-offset> and <x-offset> are integers.

<fwd-descriptor> ::= Offset(<fwd-descriptor>, <t-offset> [, <x-offset> ] )

<fwd-descriptor> ::= Switch(<fwd-descriptor>, <fwd-descriptor> [, <fwd-descriptor> ...])

;; <t-modulus> is an integer

<fwd-descriptor> ::= Round(<fwd-descriptor>, <t-modulus>)

;; <variable-name> is t or x; <value> is an integer

<fwd-descriptor> ::= ReplaceIndex(<fwd-descriptor>, <variable-name>, <value>)

;; <node-name> is the name of a node of type kInput or kComponent.

<fwd-descriptor> ::= Scale(<scale>, <node-name>)

<fwd-descriptor> ::= <node-name>

描述符的设计应该足够严格，以至于得到的表达式将相当容易计算（并生成反向代码）。当描述符与组件相连接时，它们只应该执行资源繁重的操作，而非线性的操作都应该在组件中执行。

注意：如果有必要对各种未知长度的索引（例如文件中的所有"t"值）进行求和或求平均值，需要在一个Component中执行此操作。

描述符代码

ForwardingDescriptor

以自底向上的方式介绍Descriptors：

基类ForwardingDescriptor只能处理包含单个Descriptor的表达式：

Offset(<des>, <t-offset>)

Switch(<des>, <t-offset>)

Round(<des>, <t-modulus>)

ReplaceIndex(<des>, <variable-name>, <value>)

Scale(<scale>, <node-name>)

不能处理Append(...)或Sum(...)等包含多个Descriptor的表达式。

该接口中最重要的函数为MapToInput()：

class ForwardingDescriptor {

public:

virtual Cindex MapToInput(const Index &output) const = 0;

...

}

用于将Index，转换为对应的Cindex。

比如，将Offset(input, -1)之中的"-1"对应的Index (0, -1, 0)转换为Cindex (input, (0, -1, 0))

ForwardingDescriptor有几个派生类：

SimpleForwardingDescriptor（仅保存节点索引）
OffsetForwardingDescriptor
ReplaceIndexForwardingDescriptor

等

SumDescriptor

层次结构中的下一级是类SumDescriptor，用于支持以下表达式：

Sum(<desc>, <desc>)

Failover(<desc>, <desc>)

IfDefined(<desc>)

显然，调用SumDescriptor::MapToInput可能会返回几个不同的Cindex，因此SumDescriptor无法作为ForwardingDescriptor的接口。因此还需要支持依赖项：

class SumDescriptor {
public:
virtual void GetDependencies(const Index &ind,
std::vector<Cindex> *dependencies) const = 0;
...
};

函数GetDependencies将所有可能参与ind的计算的Cindex附加到dependencies中。接下来，函数IsComputable()用于处理某些请求的输入无法计算的情况（例如，输入数据有限或语句边界问题）：

class SumDescriptor {

public:

...

virtual bool IsComputable(const Index &ind,

const CindexSet &cindex_set,

std::vector<Cindex> *input_terms) const = 0;

...

};

这里，CindexSet为一组Cindex，表示"所有可计算的Cindex的集合"。如果对于该Descriptor，此索引ind是可计算的，则该函数返回true。

例如，如果X和Y是可计算的，那么表达式Sum(X, Y)也是可计算的。如果此函数返回true，则将表达式中可计算的Cindex附加到"input_terms"中。例如，在Failover(X, Y)的表达式中，如果X是可计算的，那么只有X将被附加到"input_terms"。

类Descriptor是层次结构的*。可以被认为是SumDescriptors的向量，但该向量长度通常为1。该类用于实现Append(...)语法。该类包含以下函数：

GetDependencies()

IsComputable()

与SumDescriptor的接口相同

NumParts()

Part(int32 n)

用于访问其向量中的SumDescriptor

神经网络节点（详细）

根据上文所述，有四种类型的节点，用以下枚举类型定义：

enum NodeType { kInput, kDescriptor, kComponent, kDimRange };

实际上，NetworkNode是一个结构体：

struct NetworkNode {

NodeType node_type;

// "descriptor" is relevant only for nodes of type kDescriptor.

Descriptor descriptor;

union {

// For kComponent, the index into Nnet::components_

int32 component_index;

// for kDimRange, the node-index of the input node.

int32 node_index;

} u;

// for kInput, the dimension of the input feature. For kDimRange, the dimension

// of the output (i.e. the length of the range)

int32 dim;

// for kDimRange, the dimension of the offset into the input component's feature.

int32 dim_offset;

};

kDescriptor节点只需要"descriptor"

kComponent节点只需要"component_index"，作为Nnet中的components_数组的索引

kDimRange节点只需要"node_index"、"dim"和"dim_offset"

kInput节点需要"dim"

神经网络（详细）

Nnet的私有数据成员有：

class Nnet {

public:

...

private:

std::vector<std::string> component_names_;

std::vector<Component*> components_;

std::vector<std::string> node_names_;

std::vector<NetworkNode> nodes_;

};

component_names_与components_的大小相同；

node_names_与nodes_的大小相同；

这使得组件名与组件对象、节点名与节点对象相关联。

注意，我们将自动为kDescriptor节点指定名称："组件名+_input"，这些节点位于类型为kComponent节点之前。kDescriptor节点名不会出现在神经网络配置文件中。

NnetComputation (detail)

NnetComputation表示神经网络计算的编译版本（可执行版本），其中定义了一些类型，包括如下的枚举类型：

enum CommandType {

kAllocMatrixUndefined, kAllocMatrixZeroed,

kDeallocMatrix, kPropagate, kStoreStats, kBackprop,

kMatrixCopy, kMatrixAdd, kCopyRows, kAddRows,

kCopyRowsMulti, kCopyToRowsMulti, kAddRowsMulti, kAddToRowsMulti,

kAddRowRanges, kNoOperation, kNoOperationMarker };

以下的struct Command代表一个单独的命令及其参数。其中大多数参数都是矩阵索引
以及
组件列表索引。

struct Command {

CommandType command_type;

int32 arg1;

int32 arg2;

int32 arg3;

int32 arg4;

int32 arg5;

int32 arg6;

};

还定义了一些结构体类型，用于存储矩阵和子矩阵的大小信息。一个子矩阵是行列受限的矩阵，类似于matlab语法：some_matrix（1:10,1:20）。

struct MatrixInfo {

int32 num_rows;

int32 num_cols;

};

struct SubMatrixInfo {

int32 matrix_index; // index into "matrices": the underlying matrix.

int32 row_offset;

int32 num_rows;

int32 col_offset;

int32 num_cols;

};

结构体NnetComputation包含以下数据成员：

struct Command {

...

std::vector<Command> commands;

std::vector<MatrixInfo> matrices;

std::vector<SubMatrixInfo> submatrices;

// used in kAddRows, kAddToRows, kCopyRows, kCopyToRows. contains row-indexes.

std::vector<std::vector<int32> > indexes;

// used in kAddRowsMulti, kAddToRowsMulti, kCopyRowsMulti, kCopyToRowsMulti.

// contains pairs (sub-matrix index, row index)- or (-1,-1) meaning don't

// do anything for this row.

std::vector<std::vector<std::pair<int32,int32> > > indexes_multi;

// Indexes used in kAddRowRanges commands, containing pairs (start-index,

// end-index)

std::vector<std::vector<std::pair<int32,int32> > > indexes_ranges;

// Information about where the values and derivatives of inputs and outputs of

// the neural net live.

unordered_map<int32, std::pair<int32, int32> > input_output_info;

bool need_model_derivative;

// the following is only used in non-simple Components; ignore for now.

std::vector<ComponentPrecomputedIndexes*> component_precomputed_indexes;

...

};

其名称带由"indexes"的向量以向量索引作为输入的矩阵函数（如CopyRows，AddRows等）的参数（我们将在执行计算之前将这些向量复制到GPU卡中）。

秒客网

nnet3中的数据类型

相关文章