Study notes for Sparse Coding

时间:2023-03-08 23:56:07
Study notes for Sparse Coding

Sparse Coding

  • Sparse coding is a class of unsupervised methods for learning sets of over-complete bases to represent data efficiently. The aim of sparse coding is to find a set of basis vectors Study notes for Sparse Coding such that an input vector Study notes for Sparse Coding can be represented as a linear combination of these basis vectors:
    Study notes for Sparse Coding
  • The advantage of having an over-complete basis is that our basis vectors Study notes for Sparse Coding are better able to capture structures and patterns inherent in the input data Study notes for Sparse Coding
  • However, with an over-complete basis, the coefficients are no longer uniquely determined by the input vector. Therefore, in sparse coding, we introduce the additional criterion of sparsity to resolve the degeneracy introduced by over-completeness.
  • The sparse coding cost function is defined on a set of m input vectors as: 
    Study notes for Sparse Coding

    where Study notes for Sparse Coding is a sparsity function which penalizes Study notes for Sparse Coding for being far from zero. We can interpret the first term of the sparse coding objective as a reconstruction term which tries to force the algorithm to provide a good representation of Study notes for Sparse Coding, and the second term as a sparsity penalty which forces our representation of Study notes for Sparse Coding (i.e., the learned features) to be sparse. The constant Study notes for Sparse Coding is a scaling constant to determine the relative importance of these two contributions.

  • Although the most direct measure of sparsity is the Study notes for Sparse Coding norm, it is non-differentiable and difficult to optimize in general. In practice, common choices for the sparsity cost Study notes for Sparse Coding are the Study notes for Sparse Coding penalty and the log sparsityStudy notes for Sparse Coding
  • It is also possible to make the sparsity penalty arbitrary small by scaling down Study notes for Sparse Coding and scaling up Study notes for Sparse Coding by some large constant. To prevent this from happening, we will constrain Study notes for Sparse Coding to be less than some constant Study notes for Sparse Coding. The full sparse coding cost function hence is:
    Study notes for Sparse Coding

    Study notes for Sparse Coding

    where the constant is usually set Study notes for Sparse Coding

  • One problem is that the constraint cannot be forced using simple gradient-based methods. Hence, in practice, this constraint is weakened to a "weight decay" term designed to keep the entries of Study notes for Sparse Coding small:
    Study notes for Sparse Coding
  • Another problem is that the L1 norm is not differentiable at 0, and hence poses a problem for gradient-based methods. We will "smooth out" the L1 norm using an approximation which will allow us to use gradient descent. To "smooth out" the L1 norm, we use Study notes for Sparse Coding in place of Study notes for Sparse Coding, where Study notes for Sparse Coding is a "smoothing parameter" which can also be interpreted as a sort of "sparsity parameter" (to see this, observe that when Study notes for Sparse Coding is large compared to Study notes for Sparse Coding, the Study notes for Sparse Coding is dominated by Study notes for Sparse Coding, and taking the square root yield approximately Study notes for Sparse Coding.
  • Hence, the final objective function is:
    Study notes for Sparse Coding
  • The set of basis vectors are called "dictionary" (Study notes for Sparse Coding). Study notes for Sparse Coding is "adapted" to Study notes for Sparse Coding if it can represent it with a few basis vectors, that is, there exists a sparse vector Study notes for Sparse Coding in Study notes for Sparse Coding such that Study notes for Sparse Coding. We call Study notes for Sparse Coding the sparse code. It is illustrated as follows: 
    Study notes for Sparse Coding

Learning

  • Learning a set of basis vectors Study notes for Sparse Coding using sparse coding consists of performing two separate optimizations (i.e., alternative optimization method):
    • The first being an optimization over coefficients Study notes for Sparse Coding for each training exampleStudy notes for Sparse Coding
    • The second being an optimization over basis vectors Study notes for Sparse Coding across many training examples at once.
  • However, the classical optimization alternates between D and Study notes for Sparse Coding can achieve good results, but very slow.
  • A significant limitation of sparse coding is that even after a set of basis vectors have been learnt, in order to "encode" a new data example, optimization must be performed to obtain the required coefficients. This significant "runtime" cost means that sparse coding is computationally expensive to implement even at test time, especially compared to typical feed-forward architectures.

Remarks

  • From my view, due to the sparseness enforced in the dictionary learning (i.e., sparse code), the restored matrix is able to remove noise of original matrix, i.e., having some effect of denoising. Hence, Sparse coding could be used to denoise images.

References