ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1911.04221
11
18

Convergence to minima for the continuous version of Backtracking Gradient Descent

11 November 2019
T. Truong
ArXivPDFHTML
Abstract

The main result of this paper is: {\bf Theorem.} Let f:Rk→Rf:\mathbb{R}^k\rightarrow \mathbb{R}f:Rk→R be a C1C^{1}C1 function, so that ∇f\nabla f∇f is locally Lipschitz continuous. Assume moreover that fff is C2C^2C2 near its generalised saddle points. Fix real numbers δ0>0\delta_0>0δ0​>0 and 0<α<10<\alpha <10<α<1. Then there is a smooth function h:Rk→(0,δ0]h:\mathbb{R}^k\rightarrow (0,\delta_0]h:Rk→(0,δ0​] so that the map H:Rk→RkH:\mathbb{R}^k\rightarrow \mathbb{R}^kH:Rk→Rk defined by H(x)=x−h(x)∇f(x)H(x)=x-h(x)\nabla f(x)H(x)=x−h(x)∇f(x) has the following property: (i) For all x∈Rkx\in \mathbb{R}^kx∈Rk, we have f(H(x)))−f(x)≤−αh(x)∣∣∇f(x)∣∣2f(H(x)))-f(x)\leq -\alpha h(x)||\nabla f(x)||^2f(H(x)))−f(x)≤−αh(x)∣∣∇f(x)∣∣2. (ii) For every x0∈Rkx_0\in \mathbb{R}^kx0​∈Rk, the sequence xn+1=H(xn)x_{n+1}=H(x_n)xn+1​=H(xn​) either satisfies lim⁡n→∞∣∣xn+1−xn∣∣=0\lim_{n\rightarrow\infty}||x_{n+1}-x_n||=0limn→∞​∣∣xn+1​−xn​∣∣=0 or lim⁡n→∞∣∣xn∣∣=∞ \lim_{n\rightarrow\infty}||x_n||=\inftylimn→∞​∣∣xn​∣∣=∞. Each cluster point of {xn}\{x_n\}{xn​} is a critical point of fff. If moreover fff has at most countably many critical points, then {xn}\{x_n\}{xn​} either converges to a critical point of fff or lim⁡n→∞∣∣xn∣∣=∞\lim_{n\rightarrow\infty}||x_n||=\inftylimn→∞​∣∣xn​∣∣=∞. (iii) There is a set E1⊂Rk\mathcal{E}_1\subset \mathbb{R}^kE1​⊂Rk of Lebesgue measure 000 so that for all x0∈Rk\E1x_0\in \mathbb{R}^k\backslash \mathcal{E}_1x0​∈Rk\E1​, the sequence xn+1=H(xn)x_{n+1}=H(x_n)xn+1​=H(xn​), {\bf if converges}, cannot converge to a {\bf generalised} saddle point. (iv) There is a set E2⊂Rk\mathcal{E}_2\subset \mathbb{R}^kE2​⊂Rk of Lebesgue measure 000 so that for all x0∈Rk\E2x_0\in \mathbb{R}^k\backslash \mathcal{E}_2x0​∈Rk\E2​, any cluster point of the sequence xn+1=H(xn)x_{n+1}=H(x_n)xn+1​=H(xn​) is not a saddle point, and more generally cannot be an isolated generalised saddle point. Some other results are proven.

View on arXiv
Comments on this paper