9
4

Efficient Statistics for Sparse Graphical Models from Truncated Samples

Abstract

In this paper, we study high-dimensional estimation from truncated samples. We focus on two fundamental and classical problems: (i) inference of sparse Gaussian graphical models and (ii) support recovery of sparse linear models. (i) For Gaussian graphical models, suppose dd-dimensional samples x{\bf x} are generated from a Gaussian N(μ,Σ)N(\mu,\Sigma) and observed only if they belong to a subset SRdS \subseteq \mathbb{R}^d. We show that μ{\mu} and Σ{\Sigma} can be estimated with error ϵ\epsilon in the Frobenius norm, using O~(nz(Σ1)ϵ2)\tilde{O}\left(\frac{\textrm{nz}({\Sigma}^{-1})}{\epsilon^2}\right) samples from a truncated N(μ,Σ)\mathcal{N}({\mu},{\Sigma}) and having access to a membership oracle for SS. The set SS is assumed to have non-trivial measure under the unknown distribution but is otherwise arbitrary. (ii) For sparse linear regression, suppose samples (x,y)({\bf x},y) are generated where y=xΩ+N(0,1)y = {\bf x}^\top{{\Omega}^*} + \mathcal{N}(0,1) and (x,y)({\bf x}, y) is seen only if yy belongs to a truncation set SRS \subseteq \mathbb{R}. We consider the case that Ω{\Omega}^* is sparse with a support set of size kk. Our main result is to establish precise conditions on the problem dimension dd, the support size kk, the number of observations nn, and properties of the samples and the truncation that are sufficient to recover the support of Ω{\Omega}^*. Specifically, we show that under some mild assumptions, only O(k2logd)O(k^2 \log d) samples are needed to estimate Ω{\Omega}^* in the \ell_\infty-norm up to a bounded error. For both problems, our estimator minimizes the sum of the finite population negative log-likelihood function and an 1\ell_1-regularization term.

View on arXiv
Comments on this paper