ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.06555
13
17

Deep Network Approximation: Beyond ReLU to Diverse Activation Functions

13 July 2023
Shijun Zhang
Jianfeng Lu
Hongkai Zhao
ArXivPDFHTML
Abstract

This paper explores the expressive power of deep neural networks for a diverse range of activation functions. An activation function set A\mathscr{A}A is defined to encompass the majority of commonly used activation functions, such as ReLU\mathtt{ReLU}ReLU, LeakyReLU\mathtt{LeakyReLU}LeakyReLU, ReLU2\mathtt{ReLU}^2ReLU2, ELU\mathtt{ELU}ELU, CELU\mathtt{CELU}CELU, SELU\mathtt{SELU}SELU, Softplus\mathtt{Softplus}Softplus, GELU\mathtt{GELU}GELU, SiLU\mathtt{SiLU}SiLU, Swish\mathtt{Swish}Swish, Mish\mathtt{Mish}Mish, Sigmoid\mathtt{Sigmoid}Sigmoid, Tanh\mathtt{Tanh}Tanh, Arctan\mathtt{Arctan}Arctan, Softsign\mathtt{Softsign}Softsign, dSiLU\mathtt{dSiLU}dSiLU, and SRS\mathtt{SRS}SRS. We demonstrate that for any activation function ϱ∈A\varrho\in \mathscr{A}ϱ∈A, a ReLU\mathtt{ReLU}ReLU network of width NNN and depth LLL can be approximated to arbitrary precision by a ϱ\varrhoϱ-activated network of width 3N3N3N and depth 2L2L2L on any bounded set. This finding enables the extension of most approximation results achieved with ReLU\mathtt{ReLU}ReLU networks to a wide variety of other activation functions, albeit with slightly increased constants. Significantly, we establish that the (width, \,depth) scaling factors can be further reduced from (3,2)(3,2)(3,2) to (1,1)(1,1)(1,1) if ϱ\varrhoϱ falls within a specific subset of A\mathscr{A}A. This subset includes activation functions such as ELU\mathtt{ELU}ELU, CELU\mathtt{CELU}CELU, SELU\mathtt{SELU}SELU, Softplus\mathtt{Softplus}Softplus, GELU\mathtt{GELU}GELU, SiLU\mathtt{SiLU}SiLU, Swish\mathtt{Swish}Swish, and Mish\mathtt{Mish}Mish.

View on arXiv
Comments on this paper