论文笔记——A Deep Neural Network Compression Pipeline: Pruning, Quantization, Huffman Encoding

all connections with weights below a threshold are removed from the network.
retrain the network to learn the final weights for the remaining sparse connections.
store by compressed sparse row(CSR) or compressed sparse column(CSC) format
- requires 2nnz + n + 1, nnz is the number of non-zero elements and n is the number of columns or rows.
- store the index difference instead of the absolute position
by 9× and 13× for AlexNet and VGG-16 model.

Network quantization, further compresses the pruned network by reducing the number of bits required to represent each weight.

Huffman coding

Huffman code is a type of optimal preﬁx code that is commonly used for loss-less data compression.

这篇论文的想法是比较好的，但是因为裁剪部分权值，会导致filter矩阵的稀疏性，所以需要特别的稀疏矩阵计算库才能支持以上的操作。