Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning

10 February 2017

Abstract

In recent years, neural networks have enjoyed a renaissance as function approximators in reinforcement learning. Two decades after Teasauro's TD-Gammon achieved near top-level human performance in backgammon, the deep reinforcement learning algorithm DQN (combining Q-learning with a deep neural network, experience replay, and a separate target network) achieved human-level performance in many Atari 2600 games. The purpose of this study is twofold. First, based on the expected energy restricted Boltzmann machine (EE-RBM), we propose two activation functions for neural network function approximation in reinforcement learning: the sigmoid-weighted linear (SiL) unit and its derivative function (SiLd1). The activation of the SiL unit is computed by the sigmoid function multiplied by its input, which is equal to the contribution to the output from one hidden unit in an EE-RBM. Second, we suggest that the more traditional approach of using on-policy learning with eligibility traces, instead of experience replay, and softmax action selection can be competitive with DQN, without the need for a separate target network. We validate our proposed approach by, first, achieving new state-of-the-art results in both stochastic SZ-Tetris and Tetris with a small 10x10 board, using TD( $\lambda$ ) learning and shallow SiLd1 network agents, and, then, outperforming DQN in the Atari 2600 domain by using a deep Sarsa( $\lambda$ ) agent with SiL and SiLd1 hidden units.

View on arXiv

Comments on this paper