ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1905.07773
  4. Cited By
Online Convex Optimization in Adversarial Markov Decision Processes

Online Convex Optimization in Adversarial Markov Decision Processes

19 May 2019
Aviv A. Rosenberg
Yishay Mansour
ArXivPDFHTML

Papers citing "Online Convex Optimization in Adversarial Markov Decision Processes"

49 / 49 papers shown
Title
Solving General-Utility Markov Decision Processes in the Single-Trial Regime with Online Planning
Solving General-Utility Markov Decision Processes in the Single-Trial Regime with Online Planning
Pedro P. Santos
Alberto Sardinha
Francisco S. Melo
7
0
0
21 May 2025
Online Episodic Convex Reinforcement Learning
Online Episodic Convex Reinforcement Learning
B. Moreno
Khaled Eldowa
Pierre Gaillard
Margaux Brégère
Nadia Oudjane
OffRL
34
0
0
12 May 2025
Steering No-Regret Agents in MFGs under Model Uncertainty
Steering No-Regret Agents in MFGs under Model Uncertainty
Leo Widmer
Jiawei Huang
Niao He
LLMSV
70
1
0
12 Mar 2025
Narrowing the Gap between Adversarial and Stochastic MDPs via Policy Optimization
Narrowing the Gap between Adversarial and Stochastic MDPs via Policy Optimization
D. Tiapkin
Evgenii Chzhen
Gilles Stoltz
74
1
0
08 Jul 2024
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Qining Zhang
Honghao Wei
Lei Ying
OffRL
69
1
0
11 Jun 2024
MetaCURL: Non-stationary Concave Utility Reinforcement Learning
MetaCURL: Non-stationary Concave Utility Reinforcement Learning
B. Moreno
Margaux Brégère
Pierre Gaillard
Nadia Oudjane
OffRL
44
0
0
30 May 2024
Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback
Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback
Asaf B. Cassel
Haipeng Luo
Aviv A. Rosenberg
Dmitry Sotnikov
OffRL
52
4
0
13 May 2024
Learning Adversarial MDPs with Stochastic Hard Constraints
Learning Adversarial MDPs with Stochastic Hard Constraints
Francesco Emanuele Stradi
Matteo Castiglioni
A. Marchesi
Nicola Gatti
44
5
0
06 Mar 2024
Quantum Speedups in Regret Analysis of Infinite Horizon Average-Reward Markov Decision Processes
Quantum Speedups in Regret Analysis of Infinite Horizon Average-Reward Markov Decision Processes
Bhargav Ganguly
Yang Xu
Vaneet Aggarwal
36
0
0
18 Oct 2023
A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial
  Semi-Bandits, Linear Bandits, and MDPs
A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs
Dirk van der Hoeven
Lukas Zierahn
Tal Lancewicki
Aviv A. Rosenberg
Nicolò Cesa-Bianchi
50
4
0
15 May 2023
Improved Regret Bounds for Linear Adversarial MDPs via Linear
  Optimization
Improved Regret Bounds for Linear Adversarial MDPs via Linear Optimization
Fang-yuan Kong
Xiangcheng Zhang
Baoxiang Wang
Shuai Li
36
12
0
14 Feb 2023
Improved Regret for Efficient Online Reinforcement Learning with Linear
  Function Approximation
Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation
Uri Sherman
Tomer Koren
Yishay Mansour
55
12
0
30 Jan 2023
Refined Regret for Adversarial MDPs with Linear Function Approximation
Refined Regret for Adversarial MDPs with Linear Function Approximation
Yan Dai
Haipeng Luo
Chen-Yu Wei
Julian Zimmert
69
12
0
30 Jan 2023
Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear
  Contextual Bandits and Markov Decision Processes
Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes
Chen Ye
Wei Xiong
Quanquan Gu
Tong Zhang
43
29
0
12 Dec 2022
Socially Fair Reinforcement Learning
Socially Fair Reinforcement Learning
Debmalya Mandal
Jiarui Gan
OffRL
32
13
0
26 Aug 2022
Dynamic Regret of Online Markov Decision Processes
Dynamic Regret of Online Markov Decision Processes
Peng Zhao
Longfei Li
Zhi Zhou
OffRL
52
17
0
26 Aug 2022
Performative Reinforcement Learning
Performative Reinforcement Learning
Debmalya Mandal
Stelios Triantafyllou
Goran Radanović
36
18
0
30 Jun 2022
Learning Markov Games with Adversarial Opponents: Efficient Algorithms
  and Fundamental Limits
Learning Markov Games with Adversarial Opponents: Efficient Algorithms and Fundamental Limits
Qinghua Liu
Yuanhao Wang
Chi Jin
AAML
37
15
0
14 Mar 2022
Learning Infinite-Horizon Average-Reward Markov Decision Processes with
  Constraints
Learning Infinite-Horizon Average-Reward Markov Decision Processes with Constraints
Liyu Chen
R. Jain
Haipeng Luo
72
25
0
31 Jan 2022
Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback
Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback
Tiancheng Jin
Tal Lancewicki
Haipeng Luo
Yishay Mansour
Aviv A. Rosenberg
74
21
0
31 Jan 2022
Optimistic Policy Optimization is Provably Efficient in Non-stationary
  MDPs
Optimistic Policy Optimization is Provably Efficient in Non-stationary MDPs
Han Zhong
Zhuoran Yang
Zhaoran Wang
Csaba Szepesvári
66
21
0
18 Oct 2021
Concave Utility Reinforcement Learning with Zero-Constraint Violations
Concave Utility Reinforcement Learning with Zero-Constraint Violations
Mridul Agarwal
Qinbo Bai
Vaneet Aggarwal
38
12
0
12 Sep 2021
Policy Optimization in Adversarial MDPs: Improved Exploration via
  Dilated Bonuses
Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses
Haipeng Luo
Chen-Yu Wei
Chung-Wei Lee
55
44
0
18 Jul 2021
Model-Free Learning for Two-Player Zero-Sum Partially Observable Markov
  Games with Perfect Recall
Model-Free Learning for Two-Player Zero-Sum Partially Observable Markov Games with Perfect Recall
Tadashi Kozuno
Pierre Ménard
Rémi Munos
Michal Valko
35
18
0
11 Jun 2021
The best of both worlds: stochastic and adversarial episodic MDPs with
  unknown transition
The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition
Tiancheng Jin
Longbo Huang
Haipeng Luo
32
40
0
08 Jun 2021
Online Selection of Diverse Committees
Online Selection of Diverse Committees
Virginie Do
Jamal Atif
J. Lang
Nicolas Usunier
45
9
0
19 May 2021
Stochastic Shortest Path: Minimax, Parameter-Free and Towards
  Horizon-Free Regret
Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret
Jean Tarbouriech
Runlong Zhou
S. Du
Matteo Pirotta
M. Valko
A. Lazaric
76
35
0
22 Apr 2021
Near-optimal Policy Optimization Algorithms for Learning Adversarial
  Linear Mixture MDPs
Near-optimal Policy Optimization Algorithms for Learning Adversarial Linear Mixture MDPs
Jiafan He
Dongruo Zhou
Quanquan Gu
95
24
0
17 Feb 2021
Improved Corruption Robust Algorithms for Episodic Reinforcement
  Learning
Improved Corruption Robust Algorithms for Episodic Reinforcement Learning
Yifang Chen
S. Du
Kevin Jamieson
29
22
0
13 Feb 2021
Robust Policy Gradient against Strong Data Corruption
Robust Policy Gradient against Strong Data Corruption
Xuezhou Zhang
Yiding Chen
Xiaojin Zhu
Wen Sun
AAML
57
38
0
11 Feb 2021
Learning Adversarial Markov Decision Processes with Delayed Feedback
Learning Adversarial Markov Decision Processes with Delayed Feedback
Tal Lancewicki
Aviv A. Rosenberg
Yishay Mansour
43
32
0
29 Dec 2020
Policy Optimization as Online Learning with Mediator Feedback
Policy Optimization as Online Learning with Mediator Feedback
Alberto Maria Metelli
Matteo Papini
P. DÓro
Marcello Restelli
OffRL
32
10
0
15 Dec 2020
Minimax Regret for Stochastic Shortest Path with Adversarial Costs and
  Known Transition
Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition
Liyu Chen
Haipeng Luo
Chen-Yu Wei
34
32
0
07 Dec 2020
Online Learning in Unknown Markov Games
Online Learning in Unknown Markov Games
Yi Tian
Yuanhao Wang
Tiancheng Yu
S. Sra
OffRL
27
13
0
28 Oct 2020
A Sharp Analysis of Model-based Reinforcement Learning with Self-Play
A Sharp Analysis of Model-based Reinforcement Learning with Self-Play
Qinghua Liu
Tiancheng Yu
Yu Bai
Chi Jin
41
121
0
04 Oct 2020
Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal
  Algorithm Escaping the Curse of Horizon
Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon
Zihan Zhang
Xiangyang Ji
S. Du
OffRL
57
104
0
28 Sep 2020
Online Boosting with Bandit Feedback
Online Boosting with Bandit Feedback
Nataly Brukhim
Elad Hazan
34
10
0
23 Jul 2020
A Unifying View of Optimism in Episodic Reinforcement Learning
A Unifying View of Optimism in Episodic Reinforcement Learning
Gergely Neu
Ciara Pike-Burke
22
66
0
03 Jul 2020
Dynamic Regret of Policy Optimization in Non-stationary Environments
Dynamic Regret of Policy Optimization in Non-stationary Environments
Yingjie Fei
Zhuoran Yang
Zhaoran Wang
Qiaomin Xie
37
54
0
30 Jun 2020
Reinforcement Learning for Non-Stationary Markov Decision Processes: The
  Blessing of (More) Optimism
Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism
Wang Chi Cheung
D. Simchi-Levi
Ruihao Zhu
OffRL
36
93
0
24 Jun 2020
Near-Optimal Reinforcement Learning with Self-Play
Near-Optimal Reinforcement Learning with Self-Play
Yunru Bai
Chi Jin
Tiancheng Yu
29
130
0
22 Jun 2020
Bias no more: high-probability data-dependent regret bounds for
  adversarial bandits and MDPs
Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs
Chung-Wei Lee
Haipeng Luo
Chen-Yu Wei
Mengxiao Zhang
37
52
0
14 Jun 2020
Active Model Estimation in Markov Decision Processes
Active Model Estimation in Markov Decision Processes
Jean Tarbouriech
S. Shekhar
Matteo Pirotta
Mohammad Ghavamzadeh
A. Lazaric
26
24
0
06 Mar 2020
Exploration-Exploitation in Constrained MDPs
Exploration-Exploitation in Constrained MDPs
Yonathan Efroni
Shie Mannor
Matteo Pirotta
40
174
0
04 Mar 2020
Upper Confidence Primal-Dual Reinforcement Learning for CMDP with
  Adversarial Loss
Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss
Shuang Qiu
Xiaohan Wei
Zhuoran Yang
Jieping Ye
Zhaoran Wang
51
48
0
02 Mar 2020
Logarithmic Regret for Adversarial Online Control
Logarithmic Regret for Adversarial Online Control
Dylan J. Foster
Max Simchowitz
35
74
0
29 Feb 2020
Learning Zero-Sum Simultaneous-Move Markov Games Using Function
  Approximation and Correlated Equilibrium
Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium
Qiaomin Xie
Yudong Chen
Zhaoran Wang
Zhuoran Yang
46
125
0
17 Feb 2020
Provable Self-Play Algorithms for Competitive Reinforcement Learning
Provable Self-Play Algorithms for Competitive Reinforcement Learning
Yu Bai
Chi Jin
SSL
32
149
0
10 Feb 2020
Learning Adversarial MDPs with Bandit Feedback and Unknown Transition
Learning Adversarial MDPs with Bandit Feedback and Unknown Transition
Chi Jin
Tiancheng Jin
Haipeng Luo
S. Sra
Tiancheng Yu
13
103
0
03 Dec 2019
1