Contextual Decision Processes with Low Bellman Rank are PAC-Learnable

Contextual Decision Processes with Low Bellman Rank are PAC-Learnable

29 October 2016

A. Krishnamurthy

Robert Schapire

Papers citing "Contextual Decision Processes with Low Bellman Rank are PAC-Learnable"

19 / 19 papers shown

Title
Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective Jiawei Huang Bingcong Li Christoph Dann Niao He OffRL 151 1 0 26 Feb 2025
Decision Making in Hybrid Environments: A Model Aggregation Approach Haolin Liu Chen-Yu Wei Julian Zimmert 134 0 0 09 Feb 2025
A Model Selection Approach for Corruption Robust Reinforcement Learning Chen-Yu Wei Christoph Dann Julian Zimmert 99 44 0 31 Dec 2024
Learning a Fast Mixing Exogenous Block MDP using a Single Trajectory Alexander Levine Peter Stone Amy Zhang OffRL 53 0 0 03 Oct 2024
Combinatorial Multivariant Multi-Armed Bandits with Applications to Episodic Reinforcement Learning and Beyond Xutong Liu Siwei Wang Jinhang Zuo Han Zhong Xuchuang Wang Zhiyong Wang Shuai Li Mohammad Hajiesmaili J. C. Lui Wei Chen 122 3 0 03 Jun 2024
Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes Chen Ye Wei Xiong Quanquan Gu Tong Zhang 78 30 0 12 Dec 2022
Sample Complexity of Reinforcement Learning using Linearly Combined Model Ensembles Aditya Modi Nan Jiang Ambuj Tewari Satinder Singh 49 131 0 23 Oct 2019
Unifying Count-Based Exploration and Intrinsic Motivation Marc G. Bellemare S. Srinivasan Georg Ostrovski Tom Schaul D. Saxton Rémi Munos 156 1,465 0 06 Jun 2016
Reinforcement Learning of POMDPs using Spectral Methods Kamyar Azizzadenesheli A. Lazaric Anima Anandkumar 22 127 0 25 Feb 2016
Dueling Network Architectures for Deep Reinforcement Learning Ziyun Wang Tom Schaul Matteo Hessel H. V. Hasselt Marc Lanctot Nando de Freitas OffRL 56 3,742 0 20 Nov 2015
Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning Christoph Dann Emma Brunskill 34 249 0 29 Oct 2015
Contextual Markov Decision Processes Assaf Hallak Dotan Di Castro Shie Mannor 59 243 0 08 Feb 2015
Model-based Reinforcement Learning and the Eluder Dimension Ian Osband Benjamin Van Roy 54 188 0 07 Jun 2014
Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits Alekh Agarwal Daniel J. Hsu Satyen Kale John Langford Lihong Li Robert Schapire OffRL 161 504 0 04 Feb 2014
PEGASUS: A Policy Search Method for Large MDPs and POMDPs A. Ng Michael I. Jordan 47 496 0 16 Jan 2013
Predictive State Representations: A New Theory for Modeling Dynamical Systems Satinder Singh Michael R. James Matthew R. Rudary AI4TS AI4CE 50 288 0 11 Jul 2012
Contextual Bandit Learning with Predictable Rewards Alekh Agarwal Miroslav Dudík Satyen Kale John Langford Robert Schapire OffRL 165 86 0 07 Feb 2012
Efficient Optimal Learning for Contextual Bandits Miroslav Dudík Daniel J. Hsu Satyen Kale Nikos Karampatziakis John Langford L. Reyzin Tong Zhang 100 300 0 13 Jun 2011
Closing the Learning-Planning Loop with Predictive State Representations Byron Boots S. Siddiqi Geoffrey J. Gordon 184 264 0 12 Dec 2009