Estimation of functions of variables is considered using ridge combinations of the form where the activation function is a function with bounded value and derivative. These include single-hidden layer neural networks, polynomials, and sinusoidal models. From a sample of size of possibly noisy values at random sites , the minimax mean square error is examined for functions in the closure of the hull of ridge functions with activation . It is shown to be of order to a fractional power (when is of smaller order than ), and to be of order to a fractional power (when is of larger order than ). Dependence on constraints and on the norms of inner parameter and outer parameter , respectively, is also examined. Also, lower and upper bounds on the fractional power are given. The heart of the analysis is development of information-theoretic packing numbers for these classes of functions.
View on arXiv