27
10

Minimum width for universal approximation using ReLU networks on compact domain

Abstract

It has been shown that deep neural networks of a large enough width are universal approximators but they are not if the width is too small. There were several attempts to characterize the minimum width wminw_{\min} enabling the universal approximation property; however, only a few of them found the exact values. In this work, we show that the minimum width for LpL^p approximation of LpL^p functions from [0,1]dx[0,1]^{d_x} to Rdy\mathbb R^{d_y} is exactly max{dx,dy,2}\max\{d_x,d_y,2\} if an activation function is ReLU-Like (e.g., ReLU, GELU, Softplus). Compared to the known result for ReLU networks, wmin=max{dx+1,dy}w_{\min}=\max\{d_x+1,d_y\} when the domain is Rdx\smash{\mathbb R^{d_x}}, our result first shows that approximation on a compact domain requires smaller width than on Rdx\smash{\mathbb R^{d_x}}. We next prove a lower bound on wminw_{\min} for uniform approximation using general activation functions including ReLU: wmindy+1w_{\min}\ge d_y+1 if dx<dy2dxd_x<d_y\le2d_x. Together with our first result, this shows a dichotomy between LpL^p and uniform approximations for general activation functions and input/output dimensions.

View on arXiv
Comments on this paper