SAINT: Attention-Based Modeling of Sub-Action Dependencies in Multi-Action Policies

The combinatorial structure of many real-world action spaces leads to exponential growth in the number of possible actions, limiting the effectiveness of conventional reinforcement learning algorithms. Recent approaches for combinatorial action spaces impose factorized or sequential structures over sub-actions, failing to capture complex joint behavior. We introduce the Sub-Action Interaction Network using Transformers (SAINT), a novel policy architecture that represents multi-component actions as unordered sets and models their dependencies via self-attention conditioned on the global state. SAINT is permutation-invariant, sample-efficient, and compatible with standard policy optimization algorithms. In 15 distinct combinatorial environments across three task domains, including environments with nearly 17 million joint actions, SAINT consistently outperforms strong baselines.
View on arXiv@article{landers2025_2505.12109, title={ SAINT: Attention-Based Modeling of Sub-Action Dependencies in Multi-Action Policies }, author={ Matthew Landers and Taylor W. Killian and Thomas Hartvigsen and Afsaneh Doryab }, journal={arXiv preprint arXiv:2505.12109}, year={ 2025 } }