How to Center Binary Restricted Boltzmann Machines

6 November 2013

Abstract

It has recently been shown that subtracting the mean from the visible as well as the hidden variables of deep Boltzmann machines leads to better conditioned optimization problems and improves some aspects of model performance. In this work we analyze binary restricted Boltzmann machines, where centering is done by subtracting offset values from visible and hidden variables. We show analytically that (i) the expected performance of centered binary restricted Boltzmann machines is invariant under simultaneous flip of data and offsets, for any offset value in the range of zero to one, and (ii) using the 'enhanced gradient' is equivalent to setting the offset values to the average over model and data mean. Our results also generalize to deep Boltzmann machines. Numerical simulations suggest that (i) optimal generative performance is archived by subtracting mean values from visible as well as hidden variables, (ii) the enhanced gradient suffers from divergence more often than other centering variants, (iii) learning is stabilized if a sliding average over the batch means is used for the offset values instead of the current batch mean, this also prevents the enhanced gradient from divergence.

View on arXiv

Comments on this paper