Projection-based Lyapunov method for fully heterogeneous weakly-coupled MDPs

Heterogeneity poses a fundamental challenge for many real-world large-scale decision-making problems but remains largely understudied. In this paper, we study the fully heterogeneous setting of a prominent class of such problems, known as weakly-coupled Markov decision processes (WCMDPs). Each WCMDP consists of arms (or subproblems), which have distinct model parameters in the fully heterogeneous setting, leading to the curse of dimensionality when is large. We show that, under mild assumptions, an efficiently computable policy achieves an optimality gap in the long-run average reward per arm for fully heterogeneous WCMDPs as becomes large. This is the first asymptotic optimality result for fully heterogeneous average-reward WCMDPs. Our main technical innovation is the construction of projection-based Lyapunov functions that certify the convergence of rewards and costs to an optimal region, even under full heterogeneity.
View on arXiv@article{zhang2025_2502.06072, title={ Projection-based Lyapunov method for fully heterogeneous weakly-coupled MDPs }, author={ Xiangcheng Zhang and Yige Hong and Weina Wang }, journal={arXiv preprint arXiv:2502.06072}, year={ 2025 } }