The goal of dictionary learning is to learn an overcomplete dictionary matrix that contains signal-atoms. A signal vector can be represented, sparsely, as a linear combination of these atoms; to represent, the representation vector should satisfy the exact condition, or the approximate condition, made precise by requiring that for some small value and some norm. The vector contains the representation coefficients of the signal. Typically, the norm is selected as L1 norm|, L2 norm|, or Uniform norm|. If and D is a full-rank matrix, an infinite number of solutions are available for the representation problem. Hence, constraints should be set on the solution. Also, to ensure sparsity, the solution with the fewest nonzero coefficients is preferred. Thus, the sparsity representation is the solution of either or where the counts the nonzero entries in the vector.
K-SVD algorithm
K-SVD is a kind of generalization of K-means, as follows. The k-means clustering can be also regarded as a method of sparse representation. That is, finding the best possible codebook to represent the data samples by nearest neighbor, by solving which is equivalent to The letter F denotes the Frobenius norm. The sparse representation term enforces K-means algorithm to use only one atom in dictionary. To relax this constraint, the target of the K-SVD algorithm is to represent signal as a linear combination of atoms in. The K-SVD algorithm follows the construction flow of the K-means algorithm. However, in contrary to K-means, in order to achieve a linear combination of atoms in, the sparsity term of the constraint is relaxed so that the number of nonzero entries of each column can be more than 1, but less than a number. So, the objective function becomes or in another objective form In the K-SVD algorithm, the is first fixed and the best coefficient matrix is found. As finding the truly optimal is impossible, we use an approximation pursuit method. Any algorithm such as OMP, the orthogonal matching pursuit can be used for the calculation of the coefficients, as long as it can supply a solution with a fixed and predetermined number of nonzero entries. After the sparse coding task, the next is to search for a better dictionary. However, finding the whole dictionary all at a time is impossible, so the process is to update only one column of the dictionary each time, while fixing. The update of the -th column is done by rewriting the penalty term as where denotes the k-th row of X. By decomposing the multiplication into sum of rank 1 matrices, we can assume the other terms are assumed fixed, and the -th remains unknown. After this step, we can solve the minimization problem by approximate the term with a matrix using singular value decomposition, then update with it. However, the new solution of vector is very likely to be filled, because the sparsity constraint is not enforced. To cure this problem, Define as which points to examples that use atom . Then, define as a matrix of size, with ones on the entries and zeros otherwise. When multiplying, this shrinks the row vector by discarding the zero entries. Similarly, the multiplication is the subset of the examples that are current using the atom. The same effect can be seen on. So the minimization problem as mentioned before becomes and can be done by directly using SVD. SVD decomposes into. The solution for is the first column of U, the coefficient vector as the first column of. After updating the whole dictionary, the process then turns to iteratively solve X, then iteratively solve D.
Limitations
Choosing an appropriate "dictionary" for a dataset is a non-convex problem, and K-SVD operates by an iterative update which does not guarantee to find the global optimum. However, this is common to other algorithms for this purpose, and K-SVD works fairly well in practice.