New Algorithms for Learning Incoherent and Overcomplete Dictionaries

28 August 2013

Abstract

A matrix $A \in \R^{n \times m}$ is said to be $\mu$ -incoherent if each pair of columns has inner product at most $\mu / \sqrt{n}$ . Starting with the pioneering work of Donoho and Huo such matrices (often called {\em dictionaries}) have played a central role in signal processing, statistics and machine learning. They allow {\em sparse recovery}: there are efficient algorithms for representing a given vector as a sparse linear combination of the columns of $A$ (if such a combination exists). However, in many applications ranging from {\em sparse coding} in machine learning to image denoising, the dictionary is unknown and has to be learned from random examples of the form $Y = AX$ where $X$ is drawn from an appropriate distribution --- this is the {\em dictionary learning} problem. Existing proposed solutions such as the Method of Optimal Directions (MOD) or K-SVD do not provide any guarantees on their performance nor do they necessarily learn a dictionary for which one can solve sparse recovery problems. The only exception is the recent work of Spielman, Wang and Wright which gives a polynomial time algorithm for dictionary learning when $A$ has {\em full column rank} (in particular $m$ is at most $n$ ). However, in most settings of interest, dictionaries need to be {\em overcomplete} (i.e., $m$ is larger than $n$ ). Here we give the first polynomial time algorithm for dictionary learning when $A$ is overcomplete. It succeeds under natural conditions on how $X$ is generated, provided that $X$ has at most k \leq c \min(\sqrt{n}/\mu \log n, m^{1/2 - \epsilon}) non-zero entries (for any $\epsilon > 0$ ). In other words it can handle almost as many non-zeros as the best sparse recovery algorithms could tolerate {\em even if one knew the dictionary $A$ exactly}.

View on arXiv

Comments on this paper