In deterministic optimization problems, line search routines are a standard tool ensuring stability and efficiency. In the stochastic setting, no direct equivalent has so far been formulated, because uncertain gradients do not allow for a strict sequence of decisions collapsing the search space. We construct a probabilistic version of the line search paradigm, by combining the structure of existing deterministic methods with notions from Bayesian optimization. Our algorithm retains a Gaussian process surrogate of the univariate optimization objective, and uses a probabilistic belief over the classic Wolfe conditions to monitor the descent. Care is taken to keep all steps at low computational cost, so that the resulting method stabilizes stochastic gradient descent at only minor computational overhead. The algorithm has no user-controlled parameters. Experiments show that it effectively removes the need to define a learning rate for stochastic gradient descent.
View on arXiv