Optimal Convergence Rates of Deep Neural Network Classifiers

17 June 2025

Zihan Zhang

Lei Shi

Ding-Xuan Zhou

ArXiv (abs)PDF HTML

Main:65 Pages

1 Figures

Bibliography:2 Pages

Abstract

In this paper, we study the binary classification problem on $[0,1]^d$ under the Tsybakov noise condition (with exponent $s \in [0,\infty]$ ) and the compositional assumption. This assumption requires the conditional class probability function of the data distribution to be the composition of $q+1$ vector-valued multivariate functions, where each component function is either a maximum value function or a Hölder- $\beta$ smooth function that depends only on $d_*$ of its input variables. Notably, $d_*$ can be significantly smaller than the input dimension $d$ . We prove that, under these conditions, the optimal convergence rate for the excess 0-1 risk of classifiers is \left( \frac{1}{n} \right)^{\frac{\beta\cdot(1\wedge\beta)^q}{{\frac{d_*}{s+1}+(1+\frac{1}{s+1})\cdot\beta\cdot(1\wedge\beta)^q}}}\;\;\;, which is independent of the input dimension $d$ . Additionally, we demonstrate that ReLU deep neural networks (DNNs) trained with hinge loss can achieve this optimal convergence rate up to a logarithmic factor. This result provides theoretical justification for the excellent performance of ReLU DNNs in practical classification tasks, particularly in high-dimensional settings. The technique used to establish these results extends the oracle inequality presented in our previous work. The generalized approach is of independent interest.

View on arXiv

@article{zhang2025_2506.14899,
  title={ Optimal Convergence Rates of Deep Neural Network Classifiers },
  author={ Zihan Zhang and Lei Shi and Ding-Xuan Zhou },
  journal={arXiv preprint arXiv:2506.14899},
  year={ 2025 }
}

Comments on this paper