6
13

An improved analysis of the ER-SpUD dictionary learning algorithm

Abstract

In "dictionary learning" we observe Y=AX+EY = AX + E for some YRn×pY\in\mathbb{R}^{n\times p}, ARm×nA \in\mathbb{R}^{m\times n}, and XRm×pX\in\mathbb{R}^{m\times p}. The matrix YY is observed, and A,X,EA, X, E are unknown. Here EE is "noise" of small norm, and XX is column-wise sparse. The matrix AA is referred to as a {\em dictionary}, and its columns as {\em atoms}. Then, given some small number pp of samples, i.e.\ columns of YY, the goal is to learn the dictionary AA up to small error, as well as XX. The motivation is that in many applications data is expected to sparse when represented by atoms in the "right" dictionary AA (e.g.\ images in the Haar wavelet basis), and the goal is to learn AA from the data to then use it for other applications. Recently, [SWW12] proposed the dictionary learning algorithm ER-SpUD with provable guarantees when E=0E = 0 and m=nm = n. They showed if XX has independent entries with an expected ss non-zeroes per column for 1sn1 \lesssim s \lesssim \sqrt{n}, and with non-zero entries being subgaussian, then for pn2log2np\gtrsim n^2\log^2 n with high probability ER-SpUD outputs matrices A,XA', X' which equal A,XA, X up to permuting and scaling columns (resp.\ rows) of AA (resp.\ XX). They conjectured pnlognp\gtrsim n\log n suffices, which they showed was information theoretically necessary for {\em any} algorithm to succeed when s1s \simeq 1. Significant progress was later obtained in [LV15]. We show that for a slight variant of ER-SpUD, pnlog(n/δ)p\gtrsim n\log(n/\delta) samples suffice for successful recovery with probability 1δ1-\delta. We also show that for the unmodified ER-SpUD, pn1.99p\gtrsim n^{1.99} samples are required even to learn A,XA, X with polynomially small success probability. This resolves the main conjecture of [SWW12], and contradicts the main result of [LV15], which claimed that pnlog4np\gtrsim n\log^4 n guarantees success whp.

View on arXiv
Comments on this paper