30
2

Probabilistic Transformers

Abstract

We show that Transformers are Maximum Posterior Probability estimators for Mixtures of Gaussian Models. This brings a probabilistic point of view to Transformers and suggests extensions to other probabilistic cases.

View on arXiv
Comments on this paper