The Effective Depth Paradox: Evaluating the Relationship between Architectural Topology and Trainability in Deep CNNs
- MDE
This paper investigates the relationship between convolutional neural network (CNN) and image recognition performance through a comparative study of the VGG, ResNet and GoogLeNet architectural families. By evaluating these models under a unified experimental framework on upscaled CIFAR-10 data, we isolate the effects of depth from confounding implementation variables. We introduce a formal distinction between nominal depth (), the total count of weight-bearing layers, and effective depth (), an operational metric representing the expected number of sequential transformations encountered along all feasible forward paths. As derived in Section 3, is computed through topology-specific proxies: as the total sequential count for plain networks, the arithmetic mean of minimum and maximum path lengths for residual structures, and the sum of average branch depths for multi-branch modules. Our empirical results demonstrate that while sequential architectures such as VGG suffer from diminishing returns and severe gradient attenuation as increases, architectures with identity shortcuts or branching modules maintain optimization stability. This stability is achieved by decoupling from , thus ensuring a manageable functional depth for gradient propagation. We conclude that effective depth serves as a superior predictor of a network's scaling potential and practical trainability compared to traditional layer counts, providing a principled framework for future architectural innovation.
View on arXiv