A Minimum Relative Entropy Principle for Learning and Acting

20 October 2008

Abstract

Recently, it has been suggested that the problem of artificial intelligence is closely related to the compression problem in information theory. Model-based compressors use Bayesian inference to encode sequences from unknown sources just like agents use likelihood models to do inference over observation sequences in unknown environments. Bayes' rule implicitly minimizes the expected relative entropy between the posterior and the true distribution and, thus, leads to a minimal amount of changes in the agent's belief representation or, equivalently, to maximum compression. Here, we extend the minimum relative entropy principle to not only compress sequences of observations but also to generate actions. To this end, we pair each likelihood model with an appropriate intervention model, thus, defining a set of sensorimotor primitives or `operation modes'. Minimizing the relative entropy between the posterior over actions and the true operation mode then leads to a mixture probability distribution over actions analogous to Bayes' rule. The probabilistic treatment of actions follows Pearl's intervention calculus. The resulting stochastic policies naturally deal with the exploration-exploitation trade-off and converge to the true (possibly deterministic) policy in the limit. We show applications of this approach in several examples including the n-armed bandit problem and the Markov decision problem with unknown transition probabilities.

View on arXiv

Comments on this paper