Disentanglement by means of action-induced representations

6 February 2026

Gorka Muñoz-Gil

Hendrik Poulsen Nautrup

Arunava Majumder

Paulin de Schoulepnikoff

Florian Fürrutter

Marius Krumm

Hans J. Briegel

ArXiv (abs)PDF HTML Github

Main:8 Pages

7 Figures

Bibliography:2 Pages

5 Tables

Appendix:9 Pages

Abstract

Learning interpretable representations with variational autoencoders (VAEs) is a major goal of representation learning. The main challenge lies in obtaining disentangled representations, where each latent dimension corresponds to a distinct generative factor. This difficulty is fundamentally tied to the inability to perform nonlinear independent component analysis. Here, we introduce the framework of action-induced representations (AIRs) which models representations of physical systems given experiments (or actions) that can be performed on them. We show that, in this framework, we can provably disentangle degrees of freedom w.r.t. their action dependence. We further introduce a variational AIR architecture (VAIR) that can extract AIRs and therefore achieve provable disentanglement where standard VAEs fail. Beyond state representation, VAIR also captures the action dependence of the underlying generative factors, directly linking experiments to the degrees of freedom they influence.

View on arXiv

Comments on this paper