Risk Bounds for High-dimensional Ridge Function Combinations Including Neural Networks

Abstract
Let be a function on satisfying a spectral norm condition. For various noise settings, we show that , where is the sample size and is either a penalized least squares estimator or a greedily obtained version of such using linear combinations of ramp, sinusoidal, sigmoidal or other bounded Lipschitz ridge functions. Our risk bound is effective even when the dimension is much larger than the available sample size. For settings where the dimension is larger than the square root of the sample size this quantity is seen to improve the more familiar risk bound of , also investigated here.
View on arXivComments on this paper