Attention-based clustering

19 May 2025

Abstract

Transformers have emerged as a powerful neural network architecture capable of tackling a wide range of learning tasks. In this work, we provide a theoretical analysis of their ability to automatically extract structure from data in an unsupervised setting. In particular, we demonstrate their suitability for clustering when the input data is generated from a Gaussian mixture model. To this end, we study a simplified two-head attention layer and define a population risk whose minimization with unlabeled data drives the head parameters to align with the true mixture centroids.

View on arXiv

@article{maulen-soto2025_2505.13112,
  title={ Attention-based clustering },
  author={ Rodrigo Maulen-Soto and Claire Boyer and Pierre Marion },
  journal={arXiv preprint arXiv:2505.13112},
  year={ 2025 }
}

Comments on this paper