Reward Biased Maximum Likelihood Estimation for Reinforcement Learning

16 November 2020

Papers citing "Reward Biased Maximum Likelihood Estimation for Reinforcement Learning"

6 / 6 papers shown

Title
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF Shicong Cen Jincheng Mei Katayoon Goshvadi Hanjun Dai Tong Yang Sherry Yang Dale Schuurmans Yuejie Chi Bo Dai OffRL 72 24 0 20 Feb 2025
Incentivize without Bonus: Provably Efficient Model-based Online Multi-agent RL for Markov Games Tong Yang Bo Dai Lin Xiao Yuejie Chi OffRL 71 2 0 13 Feb 2025
When Is Partially Observable Reinforcement Learning Not Scary? Qinghua Liu Alan Chung Csaba Szepesvári Chi Jin 22 94 0 19 Apr 2022
Learning Augmented Index Policy for Optimal Service Placement at the Network Edge Guojun Xiong Rahul Singh Jian Li 27 9 0 10 Jan 2021
Whittle index based Q-learning for restless bandits with average reward Konstantin Avrachenkov Vivek Borkar 6 69 0 29 Apr 2020
Learning in Markov Decision Processes under Constraints Rahul Singh Abhishek Gupta Ness B. Shroff 51 27 0 27 Feb 2020