194
831

Variational Dropout Sparsifies Deep Neural Networks

Abstract

We explore recently proposed variational dropout technique which provided an elegant Bayesian interpretation to dropout. We extend variational dropout to the case when dropout rate is unknown and show that it can be found by optimizing evidence variational lower bound. We show that it is possible to assign and find individual dropout rates to each connection in DNN. Interestingly such assignment leads to extremely sparse solutions both in fully-connected and convolutional layers. This effect is similar to automatic relevance determination (ARD) effect in empirical Bayes but has a number of advantages. We report up to 128 fold compression of popular architectures without a large loss of accuracy providing additional evidence to the fact that modern deep architectures are very redundant.

View on arXiv
Comments on this paper

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.