58
31

Locality defeats the curse of dimensionality in convolutional teacher-student scenarios

Abstract

Convolutional neural networks perform a local and translationally-invariant treatment of the data: quantifying which of these two aspects is central to their success remains a challenge. We study this problem within a teacher-student framework for kernel regression, using `convolutional' kernels inspired by the neural tangent kernel of simple convolutional architectures of given filter size. Using heuristic methods from physics, we find in the ridgeless case that locality is key in determining the learning curve exponent β\beta (that relates the test error ϵtPβ\epsilon_t\sim P^{-\beta} to the size of the training set PP), whereas translational invariance is not. In particular, if the filter size of the teacher tt is smaller than that of the student ss, β\beta is a function of ss only and does not depend on the input dimension. We confirm our predictions on β\beta empirically. We conclude by proving, using a natural universality assumption, that performing kernel regression with a ridge that decreases with the size of the training set leads to similar learning curve exponents to those we obtain in the ridgeless case.

View on arXiv
Comments on this paper

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.