Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2302.14372
Cited By
The In-Sample Softmax for Offline Reinforcement Learning
28 February 2023
Chenjun Xiao
Han Wang
Yangchen Pan
Adam White
Martha White
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The In-Sample Softmax for Offline Reinforcement Learning"
25 / 25 papers shown
Title
Fine-Tuning without Performance Degradation
Han Wang
Adam White
Martha White
OnRL
161
0
0
01 May 2025
Mitigating Preference Hacking in Policy Optimization with Pessimism
Dhawal Gupta
Adam Fisch
Christoph Dann
Alekh Agarwal
76
0
0
10 Mar 2025
Policy Constraint by Only Support Constraint for Offline Reinforcement Learning
Yunkai Gao
Jiaming Guo
Fan Wu
Rui Zhang
OffRL
56
0
0
07 Mar 2025
Q-Distribution guided Q-learning for offline reinforcement learning: Uncertainty penalized Q-value via consistency model
Jing Zhang
Linjiajie Fang
Kexin Shi
Wenjia Wang
Bing-Yi Jing
OffRL
36
0
0
27 Oct 2024
q-exponential family for policy optimization
Lingwei Zhu
Haseeb Shah
Han Wang
Yukie Nagai
Martha White
OffRL
78
0
0
14 Aug 2024
FOSP: Fine-tuning Offline Safe Policy through World Models
Chenyang Cao
Yucheng Xin
Silang Wu
Longxiang He
Zichen Yan
Junbo Tan
Xueqian Wang
OffRL
58
0
0
06 Jul 2024
Target Networks and Over-parameterization Stabilize Off-policy Bootstrapping with Function Approximation
Fengdi Che
Chenjun Xiao
Jincheng Mei
Bo Dai
Ramki Gummadi
Oscar A Ramirez
Christopher K Harris
A. R. Mahmood
Dale Schuurmans
32
5
0
31 May 2024
Reinforcement Learning in Dynamic Treatment Regimes Needs Critical Reexamination
Zhiyao Luo
Yangchen Pan
Peter Watkinson
Tingting Zhu
OffRL
33
0
0
28 May 2024
Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RL
Yu-Juan Luo
Tianying Ji
Gang Hua
Jianwei Zhang
Huazhe Xu
Xianyuan Zhan
OffRL
OnRL
36
2
0
28 May 2024
Exclusively Penalized Q-learning for Offline Reinforcement Learning
Junghyuk Yeom
Yonghyeon Jo
Jungmo Kim
Sanghyeon Lee
Seungyul Han
OffRL
40
2
0
23 May 2024
A Simple Mixture Policy Parameterization for Improving Sample Efficiency of CVaR Optimization
Yudong Luo
Yangchen Pan
Han Wang
Philip H. S. Torr
Pascal Poupart
39
3
0
17 Mar 2024
A2PO: Towards Effective Offline Reinforcement Learning from an Advantage-aware Perspective
Yunpeng Qing
Shunyu Liu
Jingyuan Cong
Kaixuan Chen
Yihe Zhou
Mingli Song
OffRL
34
1
0
12 Mar 2024
Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees
Yifei Zhou
Ayush Sekhari
Yuda Song
Wen Sun
OffRL
OnRL
30
8
0
14 Nov 2023
Rethinking Decision Transformer via Hierarchical Reinforcement Learning
Yi Ma
Chenjun Xiao
Hebin Liang
Jianye Hao
OffRL
19
6
0
01 Nov 2023
Boosting Continuous Control with Consistency Policy
Yuhui Chen
Haoran Li
Dongbin Zhao
OffRL
41
20
0
10 Oct 2023
Factual and Personalized Recommendations using Language Models and Reinforcement Learning
Jihwan Jeong
Yinlam Chow
Guy Tennenholtz
Chih-Wei Hsu
Azamat Tulepbergenov
Mohammad Ghavamzadeh
Craig Boutilier
24
4
0
09 Oct 2023
Improving Offline-to-Online Reinforcement Learning with Q Conditioned State Entropy Exploration
Ziqi Zhang
Xiao Xiong
Zifeng Zhuang
Jinxin Liu
Donglin Wang
OffRL
OnRL
42
0
0
07 Oct 2023
Prioritized Trajectory Replay: A Replay Memory for Data-driven Reinforcement Learning
Jinyi Liu
Y. Ma
Jianye Hao
Yujing Hu
Yan Zheng
Tangjie Lv
Changjie Fan
OffRL
44
2
0
27 Jun 2023
Iteratively Refined Behavior Regularization for Offline Reinforcement Learning
Xiao Hu
Yi Ma
Chenjun Xiao
Yan Zheng
Zhaopeng Meng
OffRL
18
4
0
09 Jun 2023
PROTO: Iterative Policy Regularized Offline-to-Online Reinforcement Learning
Jianxiong Li
Xiao Hu
Haoran Xu
Jingjing Liu
Xianyuan Zhan
Ya-Qin Zhang
OffRL
OnRL
36
19
0
25 May 2023
Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning
Mitsuhiko Nakamoto
Yuexiang Zhai
Anika Singh
Max Sobol Mark
Yi Ma
Chelsea Finn
Aviral Kumar
Sergey Levine
OffRL
OnRL
112
108
0
09 Mar 2023
Constrained Policy Optimization with Explicit Behavior Density for Offline Reinforcement Learning
Jing Zhang
Chi Zhang
Wenjia Wang
Bing-Yi Jing
OffRL
27
7
0
28 Jan 2023
Offline Reinforcement Learning with Implicit Q-Learning
Ilya Kostrikov
Ashvin Nair
Sergey Levine
OffRL
214
843
0
12 Oct 2021
COMBO: Conservative Offline Model-Based Policy Optimization
Tianhe Yu
Aviral Kumar
Rafael Rafailov
Aravind Rajeswaran
Sergey Levine
Chelsea Finn
OffRL
219
413
0
16 Feb 2021
EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL
Seyed Kamyar Seyed Ghasemipour
Dale Schuurmans
S. Gu
OffRL
209
119
0
21 Jul 2020
1