ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.14372
  4. Cited By
The In-Sample Softmax for Offline Reinforcement Learning

The In-Sample Softmax for Offline Reinforcement Learning

28 February 2023
Chenjun Xiao
Han Wang
Yangchen Pan
Adam White
Martha White
    OffRL
ArXivPDFHTML

Papers citing "The In-Sample Softmax for Offline Reinforcement Learning"

25 / 25 papers shown
Title
Fine-Tuning without Performance Degradation
Fine-Tuning without Performance Degradation
Han Wang
Adam White
Martha White
OnRL
161
0
0
01 May 2025
Mitigating Preference Hacking in Policy Optimization with Pessimism
Dhawal Gupta
Adam Fisch
Christoph Dann
Alekh Agarwal
76
0
0
10 Mar 2025
Policy Constraint by Only Support Constraint for Offline Reinforcement Learning
Yunkai Gao
Jiaming Guo
Fan Wu
Rui Zhang
OffRL
56
0
0
07 Mar 2025
Q-Distribution guided Q-learning for offline reinforcement learning: Uncertainty penalized Q-value via consistency model
Q-Distribution guided Q-learning for offline reinforcement learning: Uncertainty penalized Q-value via consistency model
Jing Zhang
Linjiajie Fang
Kexin Shi
Wenjia Wang
Bing-Yi Jing
OffRL
36
0
0
27 Oct 2024
q-exponential family for policy optimization
q-exponential family for policy optimization
Lingwei Zhu
Haseeb Shah
Han Wang
Yukie Nagai
Martha White
OffRL
78
0
0
14 Aug 2024
FOSP: Fine-tuning Offline Safe Policy through World Models
FOSP: Fine-tuning Offline Safe Policy through World Models
Chenyang Cao
Yucheng Xin
Silang Wu
Longxiang He
Zichen Yan
Junbo Tan
Xueqian Wang
OffRL
58
0
0
06 Jul 2024
Target Networks and Over-parameterization Stabilize Off-policy
  Bootstrapping with Function Approximation
Target Networks and Over-parameterization Stabilize Off-policy Bootstrapping with Function Approximation
Fengdi Che
Chenjun Xiao
Jincheng Mei
Bo Dai
Ramki Gummadi
Oscar A Ramirez
Christopher K Harris
A. R. Mahmood
Dale Schuurmans
32
5
0
31 May 2024
Reinforcement Learning in Dynamic Treatment Regimes Needs Critical
  Reexamination
Reinforcement Learning in Dynamic Treatment Regimes Needs Critical Reexamination
Zhiyao Luo
Yangchen Pan
Peter Watkinson
Tingting Zhu
OffRL
33
0
0
28 May 2024
Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical
  Behaviors in Deep Off-Policy RL
Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RL
Yu-Juan Luo
Tianying Ji
Gang Hua
Jianwei Zhang
Huazhe Xu
Xianyuan Zhan
OffRL
OnRL
36
2
0
28 May 2024
Exclusively Penalized Q-learning for Offline Reinforcement Learning
Exclusively Penalized Q-learning for Offline Reinforcement Learning
Junghyuk Yeom
Yonghyeon Jo
Jungmo Kim
Sanghyeon Lee
Seungyul Han
OffRL
40
2
0
23 May 2024
A Simple Mixture Policy Parameterization for Improving Sample Efficiency
  of CVaR Optimization
A Simple Mixture Policy Parameterization for Improving Sample Efficiency of CVaR Optimization
Yudong Luo
Yangchen Pan
Han Wang
Philip H. S. Torr
Pascal Poupart
39
3
0
17 Mar 2024
A2PO: Towards Effective Offline Reinforcement Learning from an
  Advantage-aware Perspective
A2PO: Towards Effective Offline Reinforcement Learning from an Advantage-aware Perspective
Yunpeng Qing
Shunyu Liu
Jingyuan Cong
Kaixuan Chen
Yihe Zhou
Mingli Song
OffRL
34
1
0
12 Mar 2024
Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees
Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees
Yifei Zhou
Ayush Sekhari
Yuda Song
Wen Sun
OffRL
OnRL
30
8
0
14 Nov 2023
Rethinking Decision Transformer via Hierarchical Reinforcement Learning
Rethinking Decision Transformer via Hierarchical Reinforcement Learning
Yi Ma
Chenjun Xiao
Hebin Liang
Jianye Hao
OffRL
19
6
0
01 Nov 2023
Boosting Continuous Control with Consistency Policy
Boosting Continuous Control with Consistency Policy
Yuhui Chen
Haoran Li
Dongbin Zhao
OffRL
41
20
0
10 Oct 2023
Factual and Personalized Recommendations using Language Models and
  Reinforcement Learning
Factual and Personalized Recommendations using Language Models and Reinforcement Learning
Jihwan Jeong
Yinlam Chow
Guy Tennenholtz
Chih-Wei Hsu
Azamat Tulepbergenov
Mohammad Ghavamzadeh
Craig Boutilier
24
4
0
09 Oct 2023
Improving Offline-to-Online Reinforcement Learning with Q Conditioned
  State Entropy Exploration
Improving Offline-to-Online Reinforcement Learning with Q Conditioned State Entropy Exploration
Ziqi Zhang
Xiao Xiong
Zifeng Zhuang
Jinxin Liu
Donglin Wang
OffRL
OnRL
42
0
0
07 Oct 2023
Prioritized Trajectory Replay: A Replay Memory for Data-driven Reinforcement Learning
Prioritized Trajectory Replay: A Replay Memory for Data-driven Reinforcement Learning
Jinyi Liu
Y. Ma
Jianye Hao
Yujing Hu
Yan Zheng
Tangjie Lv
Changjie Fan
OffRL
44
2
0
27 Jun 2023
Iteratively Refined Behavior Regularization for Offline Reinforcement
  Learning
Iteratively Refined Behavior Regularization for Offline Reinforcement Learning
Xiao Hu
Yi Ma
Chenjun Xiao
Yan Zheng
Zhaopeng Meng
OffRL
18
4
0
09 Jun 2023
PROTO: Iterative Policy Regularized Offline-to-Online Reinforcement
  Learning
PROTO: Iterative Policy Regularized Offline-to-Online Reinforcement Learning
Jianxiong Li
Xiao Hu
Haoran Xu
Jingjing Liu
Xianyuan Zhan
Ya-Qin Zhang
OffRL
OnRL
36
19
0
25 May 2023
Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online
  Fine-Tuning
Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning
Mitsuhiko Nakamoto
Yuexiang Zhai
Anika Singh
Max Sobol Mark
Yi Ma
Chelsea Finn
Aviral Kumar
Sergey Levine
OffRL
OnRL
112
108
0
09 Mar 2023
Constrained Policy Optimization with Explicit Behavior Density for
  Offline Reinforcement Learning
Constrained Policy Optimization with Explicit Behavior Density for Offline Reinforcement Learning
Jing Zhang
Chi Zhang
Wenjia Wang
Bing-Yi Jing
OffRL
27
7
0
28 Jan 2023
Offline Reinforcement Learning with Implicit Q-Learning
Offline Reinforcement Learning with Implicit Q-Learning
Ilya Kostrikov
Ashvin Nair
Sergey Levine
OffRL
214
843
0
12 Oct 2021
COMBO: Conservative Offline Model-Based Policy Optimization
COMBO: Conservative Offline Model-Based Policy Optimization
Tianhe Yu
Aviral Kumar
Rafael Rafailov
Aravind Rajeswaran
Sergey Levine
Chelsea Finn
OffRL
219
413
0
16 Feb 2021
EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline
  and Online RL
EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL
Seyed Kamyar Seyed Ghasemipour
Dale Schuurmans
S. Gu
OffRL
209
119
0
21 Jul 2020
1