BraVE: Offline Reinforcement Learning for Discrete Combinatorial Action Spaces

Abstract
Offline reinforcement learning in high-dimensional, discrete action spaces is challenging due to the exponential scaling of the joint action space with the number of sub-actions and the complexity of modeling sub-action dependencies. Existing methods either exhaustively evaluate the action space, making them computationally infeasible, or factorize Q-values, failing to represent joint sub-action effects. We propose Branch Value Estimation (BraVE), a value-based method that uses tree-structured action traversal to evaluate a linear number of joint actions while preserving dependency structure. BraVE outperforms prior offline RL methods by up to in environments with over four million actions.
View on arXiv@article{landers2025_2410.21151, title={ BraVE: Offline Reinforcement Learning for Discrete Combinatorial Action Spaces }, author={ Matthew Landers and Taylor W. Killian and Hugo Barnes and Thomas Hartvigsen and Afsaneh Doryab }, journal={arXiv preprint arXiv:2410.21151}, year={ 2025 } }
Comments on this paper