Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes

We study (GD) for logistic regression on linearly separable data with stepsizes that adapt to the current risk, scaled by a constant hyperparameter . We show that after at most burn-in steps, GD achieves a risk upper bounded by , where is the margin of the dataset. As can be arbitrarily large, GD attains an arbitrarily small risk , though the risk evolution may be .We further construct hard datasets with margin , where any batch (or online) first-order method requires steps to find a linear separator. Thus, GD with large, adaptive stepsizes is among first-order batch methods. Notably, the classical (Novikoff, 1962), a first-order online method, also achieves a step complexity of , matching GD even in constants.Finally, our GD analysis extends to a broad class of loss functions and certain two-layer networks.
View on arXiv@article{zhang2025_2504.04105, title={ Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes }, author={ Ruiqi Zhang and Jingfeng Wu and Licong Lin and Peter L. Bartlett }, journal={arXiv preprint arXiv:2504.04105}, year={ 2025 } }