Almost Sure Convergence Rates of Stochastic Zeroth-order Gradient Descent for Łojasiewicz Functions

31 October 2022

Abstract

We prove \emph{almost sure convergence rates} of Stochastic Zeroth-order Gradient Descent (SZGD) algorithms for \L ojasiewicz functions. The SZGD algorithm iterates as \begin{align*} x_{t+1} = x_t - \eta_t \widehat{\nabla} f (x_t), \qquad t = 0,1,2,3,\cdots , \end{align*} where $f$ is the objective function that satisfies the \L ojasiewicz inequality with \L ojasiewicz exponent $\theta$ , $\eta_t$ is the step size (learning rate), and $\widehat{\nabla} f (x_t)$ is the approximate gradient estimated using zeroth-order information. We show that, for {smooth} \L ojasiewicz functions, the sequence $\{ x_t \}_{t\in\mathbb{N}}$ generated by SZGD converges to a bounded point $x_\infty$ almost surely, and $x_\infty$ is a critical point of $f$ . If $\theta \in (0,\frac{1}{2}]$ , $f (x_t) - f (x_\infty)$ , $\sum_{s=t}^\infty \| x_{s+1} - x_{s} \|^2$ and $\| x_t - x_\infty \|$ ( $\| \cdot \|$ is the Euclidean norm) converge to zero \emph{linearly almost surely}. If $\theta \in (\frac{1}{2}, 1)$ , then $f (x_t) - f (x_\infty)$ (and $\sum_{s=t}^\infty \| x_{s+1} - x_s \|^2$ ) converges to zero at rate $O \left( t^{\frac{1}{1 - 2\theta}} \right)$ almost surely; $\| x_{t} - x_\infty \|$ converges to zero at rate $O \left( t^{\frac{1-\theta}{1-2\theta}} \right)$ almost surely. To the best of our knowledge, this paper provides the first \emph{almost sure convergence rate} guarantee for stochastic zeroth order algorithms for \L ojasiewicz functions.

View on arXiv

Comments on this paper