We prove \emph{almost sure convergence rates} of Stochastic Zeroth-order Gradient Descent (SZGD) algorithms for \L ojasiewicz functions. The SZGD algorithm iterates as \begin{align*} x_{t+1} = x_t - \eta_t \widehat{\nabla} f (x_t), \qquad t = 0,1,2,3,\cdots , \end{align*} where is the objective function that satisfies the \L ojasiewicz inequality with \L ojasiewicz exponent , is the step size (learning rate), and is the approximate gradient estimated using zeroth-order information. We show that, for {smooth} \L ojasiewicz functions, the sequence generated by SZGD converges to a bounded point almost surely, and is a critical point of . If , , and ( is the Euclidean norm) converge to zero \emph{linearly almost surely}. If , then (and ) converges to zero at rate almost surely; converges to zero at rate almost surely. To the best of our knowledge, this paper provides the first \emph{almost sure convergence rate} guarantee for stochastic zeroth order algorithms for \L ojasiewicz functions.
View on arXiv