ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.05704
  4. Cited By
Narrowing the Gap between Adversarial and Stochastic MDPs via Policy Optimization
v1v2 (latest)

Narrowing the Gap between Adversarial and Stochastic MDPs via Policy Optimization

8 July 2024
D. Tiapkin
Evgenii Chzhen
Gilles Stoltz
ArXiv (abs)PDFHTML

Papers citing "Narrowing the Gap between Adversarial and Stochastic MDPs via Policy Optimization"

27 / 27 papers shown
Title
Warm-up Free Policy Optimization: Improved Regret in Linear Markov
  Decision Processes
Warm-up Free Policy Optimization: Improved Regret in Linear Markov Decision Processes
Asaf B. Cassel
Aviv A. Rosenberg
78
1
0
03 Jul 2024
Towards Optimal Regret in Adversarial Linear MDPs with Bandit Feedback
Towards Optimal Regret in Adversarial Linear MDPs with Bandit Feedback
Haolin Liu
Chen-Yu Wei
Julian Zimmert
54
6
0
17 Oct 2023
Rate-Optimal Policy Optimization for Linear Markov Decision Processes
Rate-Optimal Policy Optimization for Linear Markov Decision Processes
Uri Sherman
Alon Cohen
Tomer Koren
Yishay Mansour
72
7
0
28 Aug 2023
Settling the Sample Complexity of Online Reinforcement Learning
Settling the Sample Complexity of Online Reinforcement Learning
Zihan Zhang
Yuxin Chen
Jason D. Lee
S. Du
OffRL
194
25
0
25 Jul 2023
A Theoretical Analysis of Optimistic Proximal Policy Optimization in
  Linear Markov Decision Processes
A Theoretical Analysis of Optimistic Proximal Policy Optimization in Linear Markov Decision Processes
Han Zhong
Tong Zhang
73
29
0
15 May 2023
Policy Optimization in Adversarial MDPs: Improved Exploration via
  Dilated Bonuses
Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses
Haipeng Luo
Chen-Yu Wei
Chung-Wei Lee
97
45
0
18 Jul 2021
The best of both worlds: stochastic and adversarial episodic MDPs with
  unknown transition
The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition
Tiancheng Jin
Longbo Huang
Haipeng Luo
60
42
0
08 Jun 2021
Cautiously Optimistic Policy Optimization and Exploration with Linear
  Function Approximation
Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation
Andrea Zanette
Ching-An Cheng
Alekh Agarwal
91
53
0
24 Mar 2021
Near-optimal Policy Optimization Algorithms for Learning Adversarial
  Linear Mixture MDPs
Near-optimal Policy Optimization Algorithms for Learning Adversarial Linear Mixture MDPs
Jiafan He
Dongruo Zhou
Quanquan Gu
122
24
0
17 Feb 2021
Learning Adversarial Markov Decision Processes with Delayed Feedback
Learning Adversarial Markov Decision Processes with Delayed Feedback
Tal Lancewicki
Aviv A. Rosenberg
Yishay Mansour
62
35
0
29 Dec 2020
Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds
  Revisited
Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited
O. D. Domingues
Pierre Ménard
E. Kaufmann
Michal Valko
57
98
0
07 Oct 2020
Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal
  Algorithm Escaping the Curse of Horizon
Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon
Zihan Zhang
Xiangyang Ji
S. Du
OffRL
110
107
0
28 Sep 2020
Optimistic Policy Optimization with Bandit Feedback
Optimistic Policy Optimization with Bandit Feedback
Yonathan Efroni
Lior Shani
Aviv A. Rosenberg
Shie Mannor
59
90
0
19 Feb 2020
Provably Efficient Exploration in Policy Optimization
Provably Efficient Exploration in Policy Optimization
Qi Cai
Zhuoran Yang
Chi Jin
Zhaoran Wang
68
283
0
12 Dec 2019
Learning Adversarial MDPs with Bandit Feedback and Unknown Transition
Learning Adversarial MDPs with Bandit Feedback and Unknown Transition
Chi Jin
Tiancheng Jin
Haipeng Luo
S. Sra
Tiancheng Yu
75
104
0
03 Dec 2019
Worst-Case Regret Bounds for Exploration via Randomized Value Functions
Worst-Case Regret Bounds for Exploration via Randomized Value Functions
Daniel Russo
OffRL
48
88
0
07 Jun 2019
Online Convex Optimization in Adversarial Markov Decision Processes
Online Convex Optimization in Adversarial Markov Decision Processes
Aviv A. Rosenberg
Yishay Mansour
54
138
0
19 May 2019
Is Q-learning Provably Efficient?
Is Q-learning Provably Efficient?
Chi Jin
Zeyuan Allen-Zhu
Sébastien Bubeck
Michael I. Jordan
OffRL
78
812
0
10 Jul 2018
KL-UCB-switch: optimal regret bounds for stochastic bandits from both a
  distribution-dependent and a distribution-free viewpoints
KL-UCB-switch: optimal regret bounds for stochastic bandits from both a distribution-dependent and a distribution-free viewpoints
Aurélien Garivier
Hédi Hadiji
Pierre Menard
Gilles Stoltz
55
33
0
14 May 2018
Proximal Policy Optimization Algorithms
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
544
19,296
0
20 Jul 2017
Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement
  Learning
Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning
Christoph Dann
Tor Lattimore
Emma Brunskill
83
311
0
22 Mar 2017
Minimax Regret Bounds for Reinforcement Learning
Minimax Regret Bounds for Reinforcement Learning
M. G. Azar
Ian Osband
Rémi Munos
95
778
0
16 Mar 2017
Scale-Free Algorithms for Online Linear Optimization
Scale-Free Algorithms for Online Linear Optimization
Francesco Orabona
D. Pál
ODL
67
53
0
19 Feb 2015
Trust Region Policy Optimization
Trust Region Policy Optimization
John Schulman
Sergey Levine
Philipp Moritz
Michael I. Jordan
Pieter Abbeel
279
6,801
0
19 Feb 2015
A Second-order Bound with Excess Losses
A Second-order Bound with Excess Losses
Pierre Gaillard
Gilles Stoltz
T. Erven
76
154
0
10 Feb 2014
Follow the Leader If You Can, Hedge If You Must
Follow the Leader If You Can, Hedge If You Must
S. D. Rooij
T. Erven
Peter Grünwald
Wouter M. Koolen
202
181
0
03 Jan 2013
Adaptive Hedge
Adaptive Hedge
T. Erven
Peter Grünwald
Wouter M. Koolen
S. D. Rooij
93
50
0
28 Oct 2011
1