ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1610.09512
  4. Cited By
Contextual Decision Processes with Low Bellman Rank are PAC-Learnable

Contextual Decision Processes with Low Bellman Rank are PAC-Learnable

29 October 2016
Nan Jiang
A. Krishnamurthy
Alekh Agarwal
John Langford
Robert Schapire
ArXivPDFHTML

Papers citing "Contextual Decision Processes with Low Bellman Rank are PAC-Learnable"

19 / 19 papers shown
Title
Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective
Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective
Jiawei Huang
Bingcong Li
Christoph Dann
Niao He
OffRL
151
1
0
26 Feb 2025
Decision Making in Hybrid Environments: A Model Aggregation Approach
Decision Making in Hybrid Environments: A Model Aggregation Approach
Haolin Liu
Chen-Yu Wei
Julian Zimmert
134
0
0
09 Feb 2025
A Model Selection Approach for Corruption Robust Reinforcement Learning
A Model Selection Approach for Corruption Robust Reinforcement Learning
Chen-Yu Wei
Christoph Dann
Julian Zimmert
99
44
0
31 Dec 2024
Learning a Fast Mixing Exogenous Block MDP using a Single Trajectory
Learning a Fast Mixing Exogenous Block MDP using a Single Trajectory
Alexander Levine
Peter Stone
Amy Zhang
OffRL
53
0
0
03 Oct 2024
Combinatorial Multivariant Multi-Armed Bandits with Applications to Episodic Reinforcement Learning and Beyond
Combinatorial Multivariant Multi-Armed Bandits with Applications to Episodic Reinforcement Learning and Beyond
Xutong Liu
Siwei Wang
Jinhang Zuo
Han Zhong
Xuchuang Wang
Zhiyong Wang
Shuai Li
Mohammad Hajiesmaili
J. C. Lui
Wei Chen
122
3
0
03 Jun 2024
Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear
  Contextual Bandits and Markov Decision Processes
Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes
Chen Ye
Wei Xiong
Quanquan Gu
Tong Zhang
78
30
0
12 Dec 2022
Sample Complexity of Reinforcement Learning using Linearly Combined
  Model Ensembles
Sample Complexity of Reinforcement Learning using Linearly Combined Model Ensembles
Aditya Modi
Nan Jiang
Ambuj Tewari
Satinder Singh
49
131
0
23 Oct 2019
Unifying Count-Based Exploration and Intrinsic Motivation
Unifying Count-Based Exploration and Intrinsic Motivation
Marc G. Bellemare
S. Srinivasan
Georg Ostrovski
Tom Schaul
D. Saxton
Rémi Munos
156
1,465
0
06 Jun 2016
Reinforcement Learning of POMDPs using Spectral Methods
Reinforcement Learning of POMDPs using Spectral Methods
Kamyar Azizzadenesheli
A. Lazaric
Anima Anandkumar
22
127
0
25 Feb 2016
Dueling Network Architectures for Deep Reinforcement Learning
Dueling Network Architectures for Deep Reinforcement Learning
Ziyun Wang
Tom Schaul
Matteo Hessel
H. V. Hasselt
Marc Lanctot
Nando de Freitas
OffRL
56
3,742
0
20 Nov 2015
Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning
Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning
Christoph Dann
Emma Brunskill
34
249
0
29 Oct 2015
Contextual Markov Decision Processes
Contextual Markov Decision Processes
Assaf Hallak
Dotan Di Castro
Shie Mannor
59
243
0
08 Feb 2015
Model-based Reinforcement Learning and the Eluder Dimension
Model-based Reinforcement Learning and the Eluder Dimension
Ian Osband
Benjamin Van Roy
54
188
0
07 Jun 2014
Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits
Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits
Alekh Agarwal
Daniel J. Hsu
Satyen Kale
John Langford
Lihong Li
Robert Schapire
OffRL
161
504
0
04 Feb 2014
PEGASUS: A Policy Search Method for Large MDPs and POMDPs
PEGASUS: A Policy Search Method for Large MDPs and POMDPs
A. Ng
Michael I. Jordan
47
496
0
16 Jan 2013
Predictive State Representations: A New Theory for Modeling Dynamical
  Systems
Predictive State Representations: A New Theory for Modeling Dynamical Systems
Satinder Singh
Michael R. James
Matthew R. Rudary
AI4TS
AI4CE
50
288
0
11 Jul 2012
Contextual Bandit Learning with Predictable Rewards
Contextual Bandit Learning with Predictable Rewards
Alekh Agarwal
Miroslav Dudík
Satyen Kale
John Langford
Robert Schapire
OffRL
165
86
0
07 Feb 2012
Efficient Optimal Learning for Contextual Bandits
Efficient Optimal Learning for Contextual Bandits
Miroslav Dudík
Daniel J. Hsu
Satyen Kale
Nikos Karampatziakis
John Langford
L. Reyzin
Tong Zhang
100
300
0
13 Jun 2011
Closing the Learning-Planning Loop with Predictive State Representations
Closing the Learning-Planning Loop with Predictive State Representations
Byron Boots
S. Siddiqi
Geoffrey J. Gordon
184
264
0
12 Dec 2009
1