Optimal prediction in the linearly transformed spiked model

7 September 2017

Abstract

We consider the linearly transformed spiked model, where observations $Y_i$ are noisy linear transforms of unobserved signals of interest $X_i$ : \begin{align*} Y_i = A_i X_i + \varepsilon_i, \end{align*} for $i=1,\ldots,n$ . The transform matrices $A_i$ are also observed. We model $X_i$ as random vectors lying on an unknown low-dimensional space. How should we predict the unobserved signals (regression coefficients) $X_i$ ? The naive approach of performing regression for each observation separately is inaccurate due to the large noise. Instead, we develop optimal linear empirical Bayes methods for predicting $X_i$ by "borrowing strength" across the different samples. Our methods are applicable to large datasets and rely on weak moment assumptions. The analysis is based on random matrix theory. We discuss applications to signal processing, deconvolution, cryo-electron microscopy, and missing data in the high-noise regime. For missing data, we show in simulations that our methods are faster, more robust to noise and to unequal sampling than well-known matrix completion methods.

View on arXiv

Comments on this paper