ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2411.07724
37
0

Convergence Rate Analysis of LION

12 November 2024
Yiming Dong
Huan Li
Zhouchen Lin
ArXivPDFHTML
Abstract

The LION (evoLved sIgn mOmeNtum) optimizer for deep neural network training was found by Google via program search, with the simple sign update yet showing impressive performance in training large scale networks. Although previous studies have investigated its convergence properties, a comprehensive analysis, especially the convergence rate, is still desirable. Recognizing that LION can be regarded as solving a specific constrained problem, this paper focuses on demonstrating its convergence to the Karush-Kuhn-Tucker (KKT) point at the rate of O(dK−1/4)\cal O(\sqrt{d}K^{-1/4})O(d​K−1/4) measured by gradient ℓ1\ell_1ℓ1​ norm, where ddd is the problem dimension and KKK is the number of iteration steps. Step further, we remove the constraint and establish that LION converges to the critical point of the general unconstrained problem at the same rate. This rate not only delivers the currently optimal dependence on the problem dimension ddd but also tightly matches the theoretical lower bound for nonconvex stochastic optimization algorithms, which is typically measured using the gradient ℓ2\ell_2ℓ2​ norm, with respect to the number of iterations KKK. Through extensive experiments, we not only demonstrate that LION achieves lower loss and higher performance compared to standard SGD, but also empirically confirm that the gradient ℓ1/ℓ2\ell_1/\ell_2ℓ1​/ℓ2​ norm ratio aligns with Θ(d)\Theta(\sqrt{d})Θ(d​), thus proving that our convergence rate matches the theoretical lower bound with respect to ddd in the empirical sense.

View on arXiv
Comments on this paper