33
35

Optimal Robust Linear Regression in Nearly Linear Time

Abstract

We study the problem of high-dimensional robust linear regression where a learner is given access to nn samples from the generative model Y=X,w+ϵY = \langle X,w^* \rangle + \epsilon (with XRdX \in \mathbb{R}^d and ϵ\epsilon independent), in which an η\eta fraction of the samples have been adversarially corrupted. We propose estimators for this problem under two settings: (i) XX is L4-L2 hypercontractive, E[XX]\mathbb{E} [XX^\top] has bounded condition number and ϵ\epsilon has bounded variance and (ii) XX is sub-Gaussian with identity second moment and ϵ\epsilon is sub-Gaussian. In both settings, our estimators: (a) Achieve optimal sample complexities and recovery guarantees up to log factors and (b) Run in near linear time (O~(nd/η6)\tilde{O}(nd / \eta^6)). Prior to our work, polynomial time algorithms achieving near optimal sample complexities were only known in the setting where XX is Gaussian with identity covariance and ϵ\epsilon is Gaussian, and no linear time estimators were known for robust linear regression in any setting. Our estimators and their analysis leverage recent developments in the construction of faster algorithms for robust mean estimation to improve runtimes, and refined concentration of measure arguments alongside Gaussian rounding techniques to improve statistical sample complexities.

View on arXiv
Comments on this paper