Goal-Directed Planning by Reinforcement Learning and Active Inference

What is the difference between goal-directed and habitual behavior? We propose a novel computational framework of decision making with Bayesian inference, in which everything is integrated as an entire neural network model. The model learns to predict environmental state transitions by self-exploration and generating motor actions by sampling stochastic internal states . Habitual behavior, which is obtained from the prior distribution of , is acquired by reinforcement learning. Goal-directed behavior is determined from the posterior distribution of by planning, using active inference, to minimize the free energy for goal observation. We demonstrate the effectiveness of the proposed framework by experiments in a sensorimotor navigation task with camera observations and continuous motor actions.
View on arXiv