19
2

Sharp Information-Theoretic Thresholds for Shuffled Linear Regression

Abstract

This paper studies the problem of shuffled linear regression, where the correspondence between predictors and responses in a linear model is obfuscated by a latent permutation. Specifically, we consider the model y=ΠXβ+wy = \Pi_* X \beta_* + w, where XX is an n×dn \times d standard Gaussian design matrix, ww is Gaussian noise with entrywise variance σ2\sigma^2, Π\Pi_* is an unknown n×nn \times n permutation matrix, and β\beta_* is the regression coefficient, also unknown. Previous work has shown that, in the large nn-limit, the minimal signal-to-noise ratio (SNR\mathsf{SNR}), β2/σ2\lVert \beta_* \rVert^2/\sigma^2, for recovering the unknown permutation exactly with high probability is between n2n^2 and nCn^C for some absolute constant CC and the sharp threshold is unknown even for d=1d=1. We show that this threshold is precisely SNR=n4\mathsf{SNR} = n^4 for exact recovery throughout the sublinear regime d=o(n)d=o(n). As a by-product of our analysis, we also determine the sharp threshold of almost exact recovery to be SNR=n2\mathsf{SNR} = n^2, where all but a vanishing fraction of the permutation is reconstructed.

View on arXiv
Comments on this paper