In this short note, we provide a sample complexity lower bound for learning linear predictors with respect to the squared loss. Our focus is on an agnostic setting, where no assumptions are made on the data distribution. This contrasts with standard results in the literature, which either make distributional assumptions, refer to specific parameter settings, or use other performance measures.
View on arXiv