An improved analysis of the ER-SpUD dictionary learning algorithm

In "dictionary learning" we observe for some , , and . The matrix is observed, and are unknown. Here is "noise" of small norm, and is column-wise sparse. The matrix is referred to as a {\em dictionary}, and its columns as {\em atoms}. Then, given some small number of samples, i.e.\ columns of , the goal is to learn the dictionary up to small error, as well as . The motivation is that in many applications data is expected to sparse when represented by atoms in the "right" dictionary (e.g.\ images in the Haar wavelet basis), and the goal is to learn from the data to then use it for other applications. Recently, [SWW12] proposed the dictionary learning algorithm ER-SpUD with provable guarantees when and . They showed if has independent entries with an expected non-zeroes per column for , and with non-zero entries being subgaussian, then for with high probability ER-SpUD outputs matrices which equal up to permuting and scaling columns (resp.\ rows) of (resp.\ ). They conjectured suffices, which they showed was information theoretically necessary for {\em any} algorithm to succeed when . Significant progress was later obtained in [LV15]. We show that for a slight variant of ER-SpUD, samples suffice for successful recovery with probability . We also show that for the unmodified ER-SpUD, samples are required even to learn with polynomially small success probability. This resolves the main conjecture of [SWW12], and contradicts the main result of [LV15], which claimed that guarantees success whp.
View on arXiv