Doubly robust nearest neighbors in factor models

We introduce and analyze an improved variant of nearest neighbors (NN) for estimation with missing data in latent factor models. We consider a matrix completion problem with missing data, where the -th entry, when observed, is given by its mean plus mean-zero noise for an unknown function and latent factors and . Prior NN strategies, like unit-unit NN, for estimating the mean relies on existence of other rows with . Similarly, time-time NN strategy relies on existence of columns with . These strategies provide poor performance respectively when similar rows or similar columns are not available. Our estimate is doubly robust to this deficit in two ways: (1) As long as there exist either good row or good column neighbors, our estimate provides a consistent estimate. (2) Furthermore, if both good row and good column neighbors exist, it provides a (near-)quadratic improvement in the non-asymptotic error and admits a significantly narrower asymptotic confidence interval when compared to both unit-unit or time-time NN.
View on arXiv