46
9

Universality in halting time and its applications in optimization

Abstract

The authors present empirically universal distributions for the halting time (measured by the number of iterations to reach a given accuracy) of optimization algorithms applied to at least two systems: spin glasses and deep learning. Given an algorithm, the fluctuations of the halting time follow a universal distribution when the system is well tuned. This universality of the distribution is demonstrated by our computations: it is independent of input data and dimension. When the system is not well tuned this type of universality is destroyed. What makes this observation practically relevant is that the halting time fluctuations do not follow the universal distribution because either time is wasted for no gain in accuracy or the algorithm stops prematurely giving inaccurate results. This is consistent with the observations of Deift et al. (2015) in an analysis of the conjugate gradient algorithm.

View on arXiv
Comments on this paper