ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.14820
14
16

Gradient Descent-Ascent Provably Converges to Strict Local Minmax Equilibria with a Finite Timescale Separation

30 September 2020
Tanner Fiez
Lillian J. Ratliff
ArXivPDFHTML
Abstract

We study the role that a finite timescale separation parameter τ\tauτ has on gradient descent-ascent in two-player non-convex, non-concave zero-sum games where the learning rate of player 1 is denoted by γ1\gamma_1γ1​ and the learning rate of player 2 is defined to be γ2=τγ1\gamma_2=\tau\gamma_1γ2​=τγ1​. Existing work analyzing the role of timescale separation in gradient descent-ascent has primarily focused on the edge cases of players sharing a learning rate (τ=1\tau =1τ=1) and the maximizing player approximately converging between each update of the minimizing player (τ→∞\tau \rightarrow \inftyτ→∞). For the parameter choice of τ=1\tau=1τ=1, it is known that the learning dynamics are not guaranteed to converge to a game-theoretically meaningful equilibria in general. In contrast, Jin et al. (2020) showed that the stable critical points of gradient descent-ascent coincide with the set of strict local minmax equilibria as τ→∞\tau\rightarrow\inftyτ→∞. In this work, we bridge the gap between past work by showing there exists a finite timescale separation parameter τ∗\tau^{\ast}τ∗ such that x∗x^{\ast}x∗ is a stable critical point of gradient descent-ascent for all τ∈(τ∗,∞)\tau \in (\tau^{\ast}, \infty)τ∈(τ∗,∞) if and only if it is a strict local minmax equilibrium. Moreover, we provide an explicit construction for computing τ∗\tau^{\ast}τ∗ along with corresponding convergence rates and results under deterministic and stochastic gradient feedback. The convergence results we present are complemented by a non-convergence result: given a critical point x∗x^{\ast}x∗ that is not a strict local minmax equilibrium, then there exists a finite timescale separation τ0\tau_0τ0​ such that x∗x^{\ast}x∗ is unstable for all τ∈(τ0,∞)\tau\in (\tau_0, \infty)τ∈(τ0​,∞). Finally, we empirically demonstrate on the CIFAR-10 and CelebA datasets the significant impact timescale separation has on training performance.

View on arXiv
Comments on this paper