418

Effective and Efficient Dropout for Deep Convolutional Neural Networks

Abstract

Machine-learning-based data-driven applications have become ubiquitous, e.g., health-care analysis and database system optimization. Big training data and large (deep) models are crucial for good performance. Dropout has been widely used as an efficient regularization technique to prevent large models from overfitting. However, many recent works show that dropout does not bring much performance improvement for deep convolutional neural networks (CNNs), a popular deep learning model for data-driven applications. In this paper, we revisit the problem and investigate its failure. We attribute the failure to the conflict between the conventional dropout and the batch normalization operation after it. We propose to adjust the order of the dropout operations to address the conflict; and further, other structurally more suited dropout variants are also examined and introduced for more efficient and effective regularization for CNNs. These dropout variants can be easily integrated into the building blocks of CNNs implemented by existing deep learning libraries, e.g., Apache Singa, to provide effective regularization for CNNs. Extensive experiments on benchmark datasets CIFAR, SVHN and ImageNet are conducted to compare the existing building blocks and the proposed building blocks with the proposed customizable dropout methods. The results confirm the superiority of our building blocks due to the regularization and implicit model ensemble effect of dropout. In particular, we improve over state-of-the-art CNNs with significantly better performance of 3.17%, 16.15%, 1.44%, 21.68% error rate on CIFAR-10, CIFAR-100, SVHN and ImageNet respectively.

View on arXiv
Comments on this paper