83
70

Risk Bounds for High-dimensional Ridge Function Combinations Including Neural Networks

Abstract

Let f f^{\star} be a function on Rd \mathbb{R}^d satisfying a spectral norm condition. For various noise settings, we show that Ef^f2vf(logdn)1/4 \mathbb{E}\|\hat{f} - f^{\star} \|^2 \leq v_{f^{\star}}\left(\frac{\log d}{n}\right)^{1/4} , where n n is the sample size and f^ \hat{f} is either a penalized least squares estimator or a greedily obtained version of such using linear combinations of ramp, sinusoidal, sigmoidal or other bounded Lipschitz ridge functions. Our risk bound is effective even when the dimension d d is much larger than the available sample size. For settings where the dimension is larger than the square root of the sample size this quantity is seen to improve the more familiar risk bound of vf(dlog(n/d)n)1/2 v_{f^{\star}}\left(\frac{d\log (n/d)}{n}\right)^{1/2} , also investigated here.

View on arXiv
Comments on this paper