Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1905.07773
Cited By
Online Convex Optimization in Adversarial Markov Decision Processes
19 May 2019
Aviv A. Rosenberg
Yishay Mansour
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Online Convex Optimization in Adversarial Markov Decision Processes"
49 / 49 papers shown
Title
Solving General-Utility Markov Decision Processes in the Single-Trial Regime with Online Planning
Pedro P. Santos
Alberto Sardinha
Francisco S. Melo
7
0
0
21 May 2025
Online Episodic Convex Reinforcement Learning
B. Moreno
Khaled Eldowa
Pierre Gaillard
Margaux Brégère
Nadia Oudjane
OffRL
34
0
0
12 May 2025
Steering No-Regret Agents in MFGs under Model Uncertainty
Leo Widmer
Jiawei Huang
Niao He
LLMSV
70
1
0
12 Mar 2025
Narrowing the Gap between Adversarial and Stochastic MDPs via Policy Optimization
D. Tiapkin
Evgenii Chzhen
Gilles Stoltz
74
1
0
08 Jul 2024
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Qining Zhang
Honghao Wei
Lei Ying
OffRL
69
1
0
11 Jun 2024
MetaCURL: Non-stationary Concave Utility Reinforcement Learning
B. Moreno
Margaux Brégère
Pierre Gaillard
Nadia Oudjane
OffRL
44
0
0
30 May 2024
Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback
Asaf B. Cassel
Haipeng Luo
Aviv A. Rosenberg
Dmitry Sotnikov
OffRL
52
4
0
13 May 2024
Learning Adversarial MDPs with Stochastic Hard Constraints
Francesco Emanuele Stradi
Matteo Castiglioni
A. Marchesi
Nicola Gatti
44
5
0
06 Mar 2024
Quantum Speedups in Regret Analysis of Infinite Horizon Average-Reward Markov Decision Processes
Bhargav Ganguly
Yang Xu
Vaneet Aggarwal
36
0
0
18 Oct 2023
A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs
Dirk van der Hoeven
Lukas Zierahn
Tal Lancewicki
Aviv A. Rosenberg
Nicolò Cesa-Bianchi
50
4
0
15 May 2023
Improved Regret Bounds for Linear Adversarial MDPs via Linear Optimization
Fang-yuan Kong
Xiangcheng Zhang
Baoxiang Wang
Shuai Li
36
12
0
14 Feb 2023
Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation
Uri Sherman
Tomer Koren
Yishay Mansour
55
12
0
30 Jan 2023
Refined Regret for Adversarial MDPs with Linear Function Approximation
Yan Dai
Haipeng Luo
Chen-Yu Wei
Julian Zimmert
69
12
0
30 Jan 2023
Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes
Chen Ye
Wei Xiong
Quanquan Gu
Tong Zhang
43
29
0
12 Dec 2022
Socially Fair Reinforcement Learning
Debmalya Mandal
Jiarui Gan
OffRL
32
13
0
26 Aug 2022
Dynamic Regret of Online Markov Decision Processes
Peng Zhao
Longfei Li
Zhi Zhou
OffRL
52
17
0
26 Aug 2022
Performative Reinforcement Learning
Debmalya Mandal
Stelios Triantafyllou
Goran Radanović
36
18
0
30 Jun 2022
Learning Markov Games with Adversarial Opponents: Efficient Algorithms and Fundamental Limits
Qinghua Liu
Yuanhao Wang
Chi Jin
AAML
37
15
0
14 Mar 2022
Learning Infinite-Horizon Average-Reward Markov Decision Processes with Constraints
Liyu Chen
R. Jain
Haipeng Luo
72
25
0
31 Jan 2022
Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback
Tiancheng Jin
Tal Lancewicki
Haipeng Luo
Yishay Mansour
Aviv A. Rosenberg
74
21
0
31 Jan 2022
Optimistic Policy Optimization is Provably Efficient in Non-stationary MDPs
Han Zhong
Zhuoran Yang
Zhaoran Wang
Csaba Szepesvári
66
21
0
18 Oct 2021
Concave Utility Reinforcement Learning with Zero-Constraint Violations
Mridul Agarwal
Qinbo Bai
Vaneet Aggarwal
38
12
0
12 Sep 2021
Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses
Haipeng Luo
Chen-Yu Wei
Chung-Wei Lee
55
44
0
18 Jul 2021
Model-Free Learning for Two-Player Zero-Sum Partially Observable Markov Games with Perfect Recall
Tadashi Kozuno
Pierre Ménard
Rémi Munos
Michal Valko
35
18
0
11 Jun 2021
The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition
Tiancheng Jin
Longbo Huang
Haipeng Luo
32
40
0
08 Jun 2021
Online Selection of Diverse Committees
Virginie Do
Jamal Atif
J. Lang
Nicolas Usunier
45
9
0
19 May 2021
Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret
Jean Tarbouriech
Runlong Zhou
S. Du
Matteo Pirotta
M. Valko
A. Lazaric
76
35
0
22 Apr 2021
Near-optimal Policy Optimization Algorithms for Learning Adversarial Linear Mixture MDPs
Jiafan He
Dongruo Zhou
Quanquan Gu
95
24
0
17 Feb 2021
Improved Corruption Robust Algorithms for Episodic Reinforcement Learning
Yifang Chen
S. Du
Kevin Jamieson
29
22
0
13 Feb 2021
Robust Policy Gradient against Strong Data Corruption
Xuezhou Zhang
Yiding Chen
Xiaojin Zhu
Wen Sun
AAML
57
38
0
11 Feb 2021
Learning Adversarial Markov Decision Processes with Delayed Feedback
Tal Lancewicki
Aviv A. Rosenberg
Yishay Mansour
43
32
0
29 Dec 2020
Policy Optimization as Online Learning with Mediator Feedback
Alberto Maria Metelli
Matteo Papini
P. DÓro
Marcello Restelli
OffRL
32
10
0
15 Dec 2020
Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition
Liyu Chen
Haipeng Luo
Chen-Yu Wei
34
32
0
07 Dec 2020
Online Learning in Unknown Markov Games
Yi Tian
Yuanhao Wang
Tiancheng Yu
S. Sra
OffRL
27
13
0
28 Oct 2020
A Sharp Analysis of Model-based Reinforcement Learning with Self-Play
Qinghua Liu
Tiancheng Yu
Yu Bai
Chi Jin
41
121
0
04 Oct 2020
Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon
Zihan Zhang
Xiangyang Ji
S. Du
OffRL
57
104
0
28 Sep 2020
Online Boosting with Bandit Feedback
Nataly Brukhim
Elad Hazan
34
10
0
23 Jul 2020
A Unifying View of Optimism in Episodic Reinforcement Learning
Gergely Neu
Ciara Pike-Burke
22
66
0
03 Jul 2020
Dynamic Regret of Policy Optimization in Non-stationary Environments
Yingjie Fei
Zhuoran Yang
Zhaoran Wang
Qiaomin Xie
37
54
0
30 Jun 2020
Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism
Wang Chi Cheung
D. Simchi-Levi
Ruihao Zhu
OffRL
36
93
0
24 Jun 2020
Near-Optimal Reinforcement Learning with Self-Play
Yunru Bai
Chi Jin
Tiancheng Yu
29
130
0
22 Jun 2020
Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs
Chung-Wei Lee
Haipeng Luo
Chen-Yu Wei
Mengxiao Zhang
37
52
0
14 Jun 2020
Active Model Estimation in Markov Decision Processes
Jean Tarbouriech
S. Shekhar
Matteo Pirotta
Mohammad Ghavamzadeh
A. Lazaric
26
24
0
06 Mar 2020
Exploration-Exploitation in Constrained MDPs
Yonathan Efroni
Shie Mannor
Matteo Pirotta
40
174
0
04 Mar 2020
Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss
Shuang Qiu
Xiaohan Wei
Zhuoran Yang
Jieping Ye
Zhaoran Wang
51
48
0
02 Mar 2020
Logarithmic Regret for Adversarial Online Control
Dylan J. Foster
Max Simchowitz
35
74
0
29 Feb 2020
Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium
Qiaomin Xie
Yudong Chen
Zhaoran Wang
Zhuoran Yang
46
125
0
17 Feb 2020
Provable Self-Play Algorithms for Competitive Reinforcement Learning
Yu Bai
Chi Jin
SSL
32
149
0
10 Feb 2020
Learning Adversarial MDPs with Bandit Feedback and Unknown Transition
Chi Jin
Tiancheng Jin
Haipeng Luo
S. Sra
Tiancheng Yu
13
103
0
03 Dec 2019
1