33
15

Multiple Output Regression with Latent Noise

Abstract

In high-dimensional data, structured noise, caused by observed and unobserved factors affecting multiple target variables simultaneously, imposes a serious challenge for modeling, by masking the often weak signal. Therefore, (1) explaining away the structured noise in multiple-output regression is of paramount importance. Additionally, (2) assumptions about the correlation structure of the regression weights are needed. We note that both can be formulated in a natural way in a latent variable model, where both the interesting signal and the noise are mediated through the same latent factors. The signal model then borrows strength from the noise model by encouraging similar effects on correlated targets. We introduce a hyperparameter for the latent signal-to-noise ratio which turns out to be important for modelling weak signals, and an ordered infinite-dimensional shrinkage prior that resolves the rotational nonidentifiability in reduced-rank regression models. The model outperforms alternatives in predicting multivariate gene expression and metabolomics responses from genotype data.

View on arXiv
Comments on this paper