New Algorithms for Learning Incoherent and Overcomplete Dictionaries

A matrix is said to be -incoherent if each pair of columns has inner product at most . Starting with the pioneering work of Donoho and Huo such matrices (often called {\em dictionaries}) have played a central role in signal processing, statistics and machine learning. They allow {\em sparse recovery}: there are efficient algorithms for representing a given vector as a sparse linear combination of the columns of (if such a combination exists). However, in many applications ranging from {\em sparse coding} in machine learning to image denoising, the dictionary is unknown and has to be learned from random examples of the form where is drawn from an appropriate distribution --- this is the {\em dictionary learning} problem. Existing proposed solutions such as the Method of Optimal Directions (MOD) or K-SVD do not provide any guarantees on their performance nor do they necessarily learn a dictionary for which one can solve sparse recovery problems. The only exception is the recent work of Spielman, Wang and Wright which gives a polynomial time algorithm for dictionary learning when has {\em full column rank} (in particular is at most ). However, in most settings of interest, dictionaries need to be {\em overcomplete} (i.e., is larger than ). Here we give the first polynomial time algorithm for dictionary learning when is overcomplete. It succeeds under natural conditions on how is generated, provided that has at most k \leq c \min(\sqrt{n}/\mu \log n, m^{1/2 - \epsilon}) non-zero entries (for any ). In other words it can handle almost as many non-zeros as the best sparse recovery algorithms could tolerate {\em even if one knew the dictionary exactly}.
View on arXiv