论文笔记——A Deep Neural Network Compression Pipeline: Pruning, Quantization, Huffman Encoding

时间:2023-03-08 23:21:41
论文笔记——A Deep Neural Network Compression Pipeline: Pruning, Quantization, Huffman Encoding

论文《A Deep Neural Network Compression Pipeline: Pruning, Quantization, Huffman Encoding》

Pruning

  • by learning only the important connections.
  1. all connections with weights below a threshold are removed from the network.

  2. retrain the network to learn the final weights for the remaining sparse connections.

  3. store by compressed sparse row(CSR) or compressed sparse column(CSC) format
    • requires 2nnz + n + 1, nnz is the number of non-zero elements and n is the number of columns or rows.

    • store the index difference instead of the absolute position

  4. by 9× and 13× for AlexNet and VGG-16 model.

Quantization

  • quantize the weights to enforce weight sharing

Network quantization, further compresses the pruned network by reducing the number of bits required to represent each weight.

  1. Weight Sharing
    • k-means clustering
  2. Initialization of Shared Weights
    • Forgy(random).
      Since there are two peaks in the bimodal distribution, Forgy method tend to concentrate around those two peaks.
    • Density-based.
      This method makes the centroids denser around the two peaks, but more scatted than the Forgy method.
    • Linear initialization.
      Linear initialization linearly spaces the centroids between the [min, max] of the original weights.
  3. Feed-forward and Back-propagation

Huffman coding

  • Huffman coding

    Huffman code is a type of optimal prefix code that is commonly used for loss-less data compression.

总结

这篇论文的想法是比较好的,但是因为裁剪部分权值,会导致filter矩阵的稀疏性,所以需要特别的稀疏矩阵计算库才能支持以上的操作。