The Effective Depth Paradox: Evaluating the Relationship between Architectural Topology and Trainability in Deep CNNs

30 March 2026

Manfred M. Fischer

Joshua Pitts

MDE

ArXiv (abs)PDF HTML Github

Main:15 Pages

7 Figures

Bibliography:1 Pages

1 Tables

Abstract

This paper investigates the relationship between convolutional neural network (CNN) and image recognition performance through a comparative study of the VGG, ResNet and GoogLeNet architectural families. By evaluating these models under a unified experimental framework on upscaled CIFAR-10 data, we isolate the effects of depth from confounding implementation variables. We introduce a formal distinction between nominal depth ( $D_{\mathrm{nom}}$ ), the total count of weight-bearing layers, and effective depth ( $D_{\mathrm{eff}}$ ), an operational metric representing the expected number of sequential transformations encountered along all feasible forward paths. As derived in Section 3, $D_{\mathrm{eff}}$ is computed through topology-specific proxies: as the total sequential count for plain networks, the arithmetic mean of minimum and maximum path lengths for residual structures, and the sum of average branch depths for multi-branch modules. Our empirical results demonstrate that while sequential architectures such as VGG suffer from diminishing returns and severe gradient attenuation as $D_{\mathrm{nom}}$ increases, architectures with identity shortcuts or branching modules maintain optimization stability. This stability is achieved by decoupling $D_{\mathrm{eff}}$ from $D_{\mathrm{nom}}$ , thus ensuring a manageable functional depth for gradient propagation. We conclude that effective depth serves as a superior predictor of a network's scaling potential and practical trainability compared to traditional layer counts, providing a principled framework for future architectural innovation.

View on arXiv

Comments on this paper