ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.08695
  4. Cited By
Towards Instance-Optimal Offline Reinforcement Learning with Pessimism

Towards Instance-Optimal Offline Reinforcement Learning with Pessimism

17 October 2021
Ming Yin
Yu-Xiang Wang
    OffRL
ArXivPDFHTML

Papers citing "Towards Instance-Optimal Offline Reinforcement Learning with Pessimism"

29 / 29 papers shown
Title
Improved Algorithms for Differentially Private Language Model Alignment
Improved Algorithms for Differentially Private Language Model Alignment
Keyu Chen
Hao Tang
Qinglin Liu
Yizhao Xu
33
0
0
13 May 2025
On The Statistical Complexity of Offline Decision-Making
On The Statistical Complexity of Offline Decision-Making
Thanh Nguyen-Tang
R. Arora
OffRL
48
1
0
10 Jan 2025
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is
  Implicitly an Adversarial Regularizer
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Zhihan Liu
Miao Lu
Shenao Zhang
Boyi Liu
Hongyi Guo
Yingxiang Yang
Jose H. Blanchet
Zhaoran Wang
50
43
0
26 May 2024
On the Curses of Future and History in Future-dependent Value Functions
  for Off-policy Evaluation
On the Curses of Future and History in Future-dependent Value Functions for Off-policy Evaluation
Yuheng Zhang
Nan Jiang
OffRL
29
4
0
22 Feb 2024
Federated Offline Reinforcement Learning: Collaborative Single-Policy
  Coverage Suffices
Federated Offline Reinforcement Learning: Collaborative Single-Policy Coverage Suffices
Jiin Woo
Laixi Shi
Gauri Joshi
Yuejie Chi
OffRL
34
3
0
08 Feb 2024
MoMA: Model-based Mirror Ascent for Offline Reinforcement Learning
MoMA: Model-based Mirror Ascent for Offline Reinforcement Learning
Mao Hong
Zhiyue Zhang
Yue Wu
Yan Xu
OffRL
50
0
0
21 Jan 2024
On Sample-Efficient Offline Reinforcement Learning: Data Diversity,
  Posterior Sampling, and Beyond
On Sample-Efficient Offline Reinforcement Learning: Data Diversity, Posterior Sampling, and Beyond
Thanh Nguyen-Tang
Raman Arora
OffRL
35
3
0
06 Jan 2024
Privately Aligning Language Models with Reinforcement Learning
Privately Aligning Language Models with Reinforcement Learning
Fan Wu
Huseyin A. Inan
A. Backurs
Varun Chandrasekaran
Janardhan Kulkarni
Robert Sim
35
6
0
25 Oct 2023
Offline Meta Reinforcement Learning with In-Distribution Online
  Adaptation
Offline Meta Reinforcement Learning with In-Distribution Online Adaptation
Jianhao Wang
Jin Zhang
Haozhe Jiang
Junyu Zhang
Liwei Wang
Chongjie Zhang
OffRL
26
9
0
31 May 2023
Double Pessimism is Provably Efficient for Distributionally Robust
  Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage
Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage
Jose H. Blanchet
Miao Lu
Tong Zhang
Han Zhong
OffRL
45
30
0
16 May 2023
Offline Learning in Markov Games with General Function Approximation
Offline Learning in Markov Games with General Function Approximation
Yuheng Zhang
Yunru Bai
Nan Jiang
OffRL
21
8
0
06 Feb 2023
Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage
Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage
Masatoshi Uehara
Nathan Kallus
Jason D. Lee
Wen Sun
OffRL
50
5
0
05 Feb 2023
Selective Uncertainty Propagation in Offline RL
Selective Uncertainty Propagation in Offline RL
Sanath Kumar Krishnamurthy
Shrey Modi
Tanmay Gangwani
S. Katariya
B. Kveton
A. Rangi
OffRL
61
0
0
01 Feb 2023
Importance Weighted Actor-Critic for Optimal Conservative Offline
  Reinforcement Learning
Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement Learning
Hanlin Zhu
Paria Rashidinejad
Jiantao Jiao
OffRL
42
15
0
30 Jan 2023
Policy learning "without'' overlap: Pessimism and generalized empirical
  Bernstein's inequality
Policy learning "without'' overlap: Pessimism and generalized empirical Bernstein's inequality
Ying Jin
Zhimei Ren
Zhuoran Yang
Zhaoran Wang
OffRL
32
25
0
19 Dec 2022
On Instance-Dependent Bounds for Offline Reinforcement Learning with
  Linear Function Approximation
On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation
Thanh Nguyen-Tang
Ming Yin
Sunil R. Gupta
Svetha Venkatesh
R. Arora
OffRL
58
16
0
23 Nov 2022
Offline Estimation of Controlled Markov Chains: Minimaxity and Sample
  Complexity
Offline Estimation of Controlled Markov Chains: Minimaxity and Sample Complexity
Imon Banerjee
Harsha Honnappa
Vinayak A. Rao
OffRL
11
0
0
14 Nov 2022
A Near-Optimal Primal-Dual Method for Off-Policy Learning in CMDP
A Near-Optimal Primal-Dual Method for Off-Policy Learning in CMDP
Fan Chen
Junyu Zhang
Zaiwen Wen
OffRL
39
8
0
13 Jul 2022
Offline Stochastic Shortest Path: Learning, Evaluation and Towards
  Optimality
Offline Stochastic Shortest Path: Learning, Evaluation and Towards Optimality
Ming Yin
Wenjing Chen
Mengdi Wang
Yu-Xiang Wang
OffRL
30
4
0
10 Jun 2022
Incorporating Explicit Uncertainty Estimates into Deep Offline
  Reinforcement Learning
Incorporating Explicit Uncertainty Estimates into Deep Offline Reinforcement Learning
David Brandfonbrener
Rémi Tachet des Combes
Romain Laroche
OffRL
37
5
0
02 Jun 2022
Offline Reinforcement Learning with Differential Privacy
Offline Reinforcement Learning with Differential Privacy
Dan Qiao
Yu-Xiang Wang
OffRL
41
23
0
02 Jun 2022
Stabilizing Q-learning with Linear Architectures for Provably Efficient
  Learning
Stabilizing Q-learning with Linear Architectures for Provably Efficient Learning
Andrea Zanette
Martin J. Wainwright
OOD
38
5
0
01 Jun 2022
Pessimism in the Face of Confounders: Provably Efficient Offline
  Reinforcement Learning in Partially Observable Markov Decision Processes
Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes
Miao Lu
Yifei Min
Zhaoran Wang
Zhuoran Yang
OffRL
57
22
0
26 May 2022
The Efficacy of Pessimism in Asynchronous Q-Learning
The Efficacy of Pessimism in Asynchronous Q-Learning
Yuling Yan
Gen Li
Yuxin Chen
Jianqing Fan
OffRL
78
40
0
14 Mar 2022
Near-optimal Offline Reinforcement Learning with Linear Representation:
  Leveraging Variance Information with Pessimism
Near-optimal Offline Reinforcement Learning with Linear Representation: Leveraging Variance Information with Pessimism
Ming Yin
Yaqi Duan
Mengdi Wang
Yu-Xiang Wang
OffRL
34
66
0
11 Mar 2022
When is Offline Two-Player Zero-Sum Markov Game Solvable?
When is Offline Two-Player Zero-Sum Markov Game Solvable?
Qiwen Cui
S. Du
OffRL
39
29
0
10 Jan 2022
A Statistical Analysis of Polyak-Ruppert Averaged Q-learning
A Statistical Analysis of Polyak-Ruppert Averaged Q-learning
Xiang Li
Wenhao Yang
Jiadong Liang
Zhihua Zhang
Michael I. Jordan
40
15
0
29 Dec 2021
Optimal Uniform OPE and Model-based Offline Reinforcement Learning in
  Time-Homogeneous, Reward-Free and Task-Agnostic Settings
Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings
Ming Yin
Yu-Xiang Wang
OffRL
32
19
0
13 May 2021
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on
  Open Problems
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
Sergey Levine
Aviral Kumar
George Tucker
Justin Fu
OffRL
GP
343
1,968
0
04 May 2020
1