9
0

Implicit bias of any algorithm: bounding bias via margin

Abstract

Consider nn points x1,,xnx_1,\ldots,x_n in finite-dimensional euclidean space, each having one of two colors. Suppose there exists a separating hyperplane (identified with its unit normal vector w)w) for the points, i.e a hyperplane such that points of same color lie on the same side of the hyperplane. We measure the quality of such a hyperplane by its margin γ(w)\gamma(w), defined as minimum distance between any of the points xix_i and the hyperplane. In this paper, we prove that the margin function γ\gamma satisfies a nonsmooth Kurdyka-Lojasiewicz inequality with exponent 1/21/2. This result has far-reaching consequences. For example, let γopt\gamma^{opt} be the maximum possible margin for the problem and let woptw^{opt} be the parameter for the hyperplane which attains this value. Given any other separating hyperplane with parameter ww, let d(w):=wwoptd(w):=\|w-w^{opt}\| be the euclidean distance between ww and woptw^{opt}, also called the bias of ww. From the previous KL-inequality, we deduce that (γoptγ(w))/Rd(w)2(γoptγ(w))/γopt(\gamma^{opt}-\gamma(w)) / R \le d(w) \le 2\sqrt{(\gamma^{opt}-\gamma(w))/\gamma^{opt}}, where R:=maxixiR:=\max_i \|x_i\| is the maximum distance of the points xix_i from the origin. Consequently, for any optimization algorithm (gradient-descent or not), the bias of the iterates converges at least as fast as the square-root of the rate of their convergence of the margin. Thus, our work provides a generic tool for analyzing the implicit bias of any algorithm in terms of its margin, in situations where a specialized analysis might not be available: it is sufficient to establish a good rate for converge of the margin, a task which is usually much easier.

View on arXiv
Comments on this paper