12
5

Minimax Bounds for Distributed Logistic Regression

Abstract

We consider a distributed logistic regression problem where labeled data pairs (Xi,Yi)Rd×{1,1}(X_i,Y_i)\in \mathbb{R}^d\times\{-1,1\} for i=1,,ni=1,\ldots,n are distributed across multiple machines in a network and must be communicated to a centralized estimator using at most kk bits per labeled pair. We assume that the data XiX_i come independently from some distribution PXP_X, and that the distribution of YiY_i conditioned on XiX_i follows a logistic model with some parameter θRd\theta\in\mathbb{R}^d. By using a Fisher information argument, we give minimax lower bounds for estimating θ\theta under different assumptions on the tail of the distribution PXP_X. We consider both 2\ell^2 and logistic losses, and show that for the logistic loss our sub-Gaussian lower bound is order-optimal and cannot be improved.

View on arXiv
Comments on this paper