Byzantine-Robust Distributed SGD: A Unified Analysis and Tight Error Bounds

11 April 2026

Boyuan Ruan

Xiaoyu Wang

Ya-Feng Liu

FedML

ArXiv (abs)PDF HTML Github

Main:9 Pages

4 Figures

Bibliography:3 Pages

1 Tables

Appendix:23 Pages

Abstract

Byzantine-robust distributed optimization relies on robust aggregation rules to mitigate the influence of malicious Byzantine workers. Despite the proliferation of such rules, a unified convergence analysis framework that accommodates general data heterogeneity is lacking. In this work, we provide a thorough convergence theory of Byzantine-robust distributed stochastic gradient descent (SGD), analyzing variants both with and without local momentum. We establish the convergence rates for nonconvex smooth objectives and those satisfying the Polyak-Lojasiewicz condition under a general data heterogeneity assumption. Our analysis reveals that while stochasticity and data heterogeneity introduce unavoidable error floors, local momentum provably reduces the error component induced by stochasticity. Furthermore, we derive matching lower bounds to demonstrate that the upper bounds obtained in our analysis are tight and characterize the fundamental limits of Byzantine resilience under stochasticity and data heterogeneity. Empirical results support our theoretical findings.

View on arXiv

Comments on this paper