Taming Preconditioner Drift: Unlocking the Potential of Second-Order Optimizers for Federated Learning on Non-IID Data
- FedML
Second-order optimizers can significantly accelerate large-scale training, yet their naive federated variants are often unstable or even diverge on non-IID data.We show that a key culprit is \emph{preconditioner drift}: client-side second-order training induces heterogeneous \emph{curvature-defined geometries} (i.e., preconditioner coordinate systems), and server-side model averaging updates computed under incompatible metrics, corrupting the global descent direction.To address this geometric mismatch, we propose \texttt{FedPAC}, a \emph{preconditioner alignment and correction} framework for reliable federated second-order optimization.\texttt{FedPAC} explicitly decouples parameter aggregation from geometry synchronization by:(i) \textbf{Alignment} (i.e.,aggregating local preconditioners into a global reference and warm-starting clients via global preconditioner); and(ii) \textbf{Correction} (i.e., steering local preconditioned updates using a global preconditioned direction to suppress long-term drift).We provide drift-coupled non-convex convergence guarantees with linear speedup under partial participation.Empirically, \texttt{FedPAC} consistently improves stability and accuracy across vision and language tasks, achieving up to absolute accuracy gain on CIFAR-100 with ViTs.Code is available atthis https URL.
View on arXiv