In this paper, we study the binary classification problem on under the Tsybakov noise condition (with exponent ) and the compositional assumption. This assumption requires the conditional class probability function of the data distribution to be the composition of vector-valued multivariate functions, where each component function is either a maximum value function or a Hölder- smooth function that depends only on of its input variables. Notably, can be significantly smaller than the input dimension . We prove that, under these conditions, the optimal convergence rate for the excess 0-1 risk of classifiers is \left( \frac{1}{n} \right)^{\frac{\beta\cdot(1\wedge\beta)^q}{{\frac{d_*}{s+1}+(1+\frac{1}{s+1})\cdot\beta\cdot(1\wedge\beta)^q}}}\;\;\;, which is independent of the input dimension . Additionally, we demonstrate that ReLU deep neural networks (DNNs) trained with hinge loss can achieve this optimal convergence rate up to a logarithmic factor. This result provides theoretical justification for the excellent performance of ReLU DNNs in practical classification tasks, particularly in high-dimensional settings. The technique used to establish these results extends the oracle inequality presented in our previous work. The generalized approach is of independent interest.
View on arXiv@article{zhang2025_2506.14899, title={ Optimal Convergence Rates of Deep Neural Network Classifiers }, author={ Zihan Zhang and Lei Shi and Ding-Xuan Zhou }, journal={arXiv preprint arXiv:2506.14899}, year={ 2025 } }