ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1704.06440
  4. Cited By
Equivalence Between Policy Gradients and Soft Q-Learning

Equivalence Between Policy Gradients and Soft Q-Learning

21 April 2017
John Schulman
Xi Chen
Pieter Abbeel
    OffRL
ArXivPDFHTML

Papers citing "Equivalence Between Policy Gradients and Soft Q-Learning"

50 / 86 papers shown
Title
A Two-Timescale Primal-Dual Framework for Reinforcement Learning via Online Dual Variable Guidance
A Two-Timescale Primal-Dual Framework for Reinforcement Learning via Online Dual Variable Guidance
Axel Friedrich Wolter
Tobias Sutter
OffRL
37
0
0
07 May 2025
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
Taiwei Shi
Yiyang Wu
Linxin Song
Dinesh Manocha
Jieyu Zhao
LRM
86
3
0
07 Apr 2025
Divergence-Augmented Policy Optimization
Qing Wang
Yingru Li
Jiechao Xiong
Tong Zhang
OffRL
47
16
0
28 Jan 2025
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Heyang Zhao
Chenlu Ye
Quanquan Gu
Tong Zhang
OffRL
57
3
0
07 Nov 2024
Value Improved Actor Critic Algorithms
Value Improved Actor Critic Algorithms
Yaniv Oren
Moritz A. Zanger
Pascal R. van der Vaart
M. Spaan
Wendelin Bohmer
Wendelin Bohmer
OffRL
33
0
0
03 Jun 2024
Offline Regularised Reinforcement Learning for Large Language Models
  Alignment
Offline Regularised Reinforcement Learning for Large Language Models Alignment
Pierre Harvey Richemond
Yunhao Tang
Daniel Guo
Daniele Calandriello
M. G. Azar
...
Gil Shamir
Rishabh Joshi
Tianqi Liu
Rémi Munos
Bilal Piot
OffRL
46
24
0
29 May 2024
Learning diverse attacks on large language models for robust red-teaming and safety tuning
Learning diverse attacks on large language models for robust red-teaming and safety tuning
Seanie Lee
Minsu Kim
Lynn Cherif
David Dobre
Juho Lee
...
Kenji Kawaguchi
Gauthier Gidel
Yoshua Bengio
Nikolay Malkin
Moksh Jain
AAML
63
12
0
28 May 2024
Reinforcing Language Agents via Policy Optimization with Action
  Decomposition
Reinforcing Language Agents via Policy Optimization with Action Decomposition
Muning Wen
Bo Liu
Weinan Zhang
Jun Wang
Ying Wen
48
8
0
23 May 2024
A Dual Perspective of Reinforcement Learning for Imposing Policy Constraints
A Dual Perspective of Reinforcement Learning for Imposing Policy Constraints
Bram De Cooman
Johan A. K. Suykens
38
0
0
25 Apr 2024
Imitation-regularized Optimal Transport on Networks: Provable Robustness and Application to Logistics Planning
Imitation-regularized Optimal Transport on Networks: Provable Robustness and Application to Logistics Planning
Koshi Oishi
Yota Hashizume
Tomohiko Jimbo
Hirotaka Kaji
Kenji Kashima
OOD
48
2
0
28 Feb 2024
ACE : Off-Policy Actor-Critic with Causality-Aware Entropy
  Regularization
ACE : Off-Policy Actor-Critic with Causality-Aware Entropy Regularization
Tianying Ji
Yongyuan Liang
Yan Zeng
Yu-Juan Luo
Guowei Xu
Jiawei Guo
Ruijie Zheng
Furong Huang
Gang Hua
Huazhe Xu
CML
55
11
0
22 Feb 2024
Reinforcement Learning in the Era of LLMs: What is Essential? What is
  needed? An RL Perspective on RLHF, Prompting, and Beyond
Reinforcement Learning in the Era of LLMs: What is Essential? What is needed? An RL Perspective on RLHF, Prompting, and Beyond
Hao Sun
OffRL
34
21
0
09 Oct 2023
Fairness in Preference-based Reinforcement Learning
Fairness in Preference-based Reinforcement Learning
Umer Siddique
Abhinav Sinha
Yongcan Cao
19
4
0
16 Jun 2023
Policy Representation via Diffusion Probability Model for Reinforcement
  Learning
Policy Representation via Diffusion Probability Model for Reinforcement Learning
Long Yang
Zhixiong Huang
Fenghao Lei
Yucun Zhong
Yiming Yang
Cong Fang
Shiting Wen
Binbin Zhou
Zhouchen Lin
DiffM
41
40
0
22 May 2023
Efficient Quality-Diversity Optimization through Diverse Quality Species
Efficient Quality-Diversity Optimization through Diverse Quality Species
Ryan Wickman
Bibek Poudel
Taylor Michael Villarreal
Xiaofei Zhang
Weizi Li
38
6
0
14 Apr 2023
Language Instructed Reinforcement Learning for Human-AI Coordination
Language Instructed Reinforcement Learning for Human-AI Coordination
Hengyuan Hu
Dorsa Sadigh
LM&Ro
35
60
0
13 Apr 2023
Matryoshka Policy Gradient for Entropy-Regularized RL: Convergence and
  Global Optimality
Matryoshka Policy Gradient for Entropy-Regularized RL: Convergence and Global Optimality
François Ged
M. H. Veiga
33
0
0
22 Mar 2023
Fast Rates for Maximum Entropy Exploration
Fast Rates for Maximum Entropy Exploration
D. Tiapkin
Denis Belomestny
Daniele Calandriello
Eric Moulines
Rémi Munos
A. Naumov
Pierre Perrault
Yunhao Tang
Michal Valko
Pierre Menard
44
18
0
14 Mar 2023
Inference on Optimal Dynamic Policies via Softmax Approximation
Inference on Optimal Dynamic Policies via Softmax Approximation
Qizhao Chen
Morgane Austern
Vasilis Syrgkanis
OffRL
36
1
0
08 Mar 2023
Model-based Constrained MDP for Budget Allocation in Sequential
  Incentive Marketing
Model-based Constrained MDP for Budget Allocation in Sequential Incentive Marketing
Shuai Xiao
Le Guo
Zaifan Jiang
Lei Lv
Yuanbo Chen
Jun Zhu
Shuang Yang
30
21
0
02 Mar 2023
Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage
Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage
Masatoshi Uehara
Nathan Kallus
Jason D. Lee
Wen Sun
OffRL
50
5
0
05 Feb 2023
A general Markov decision process formalism for action-state
  entropy-regularized reward maximization
A general Markov decision process formalism for action-state entropy-regularized reward maximization
D. Grytskyy
Jorge Ramírez-Ruiz
R. Moreno-Bote
22
3
0
02 Feb 2023
An Efficient Solution to s-Rectangular Robust Markov Decision Processes
An Efficient Solution to s-Rectangular Robust Markov Decision Processes
Navdeep Kumar
Kfir Y. Levy
Kaixin Wang
Shie Mannor
36
2
0
31 Jan 2023
Policy Gradient for Rectangular Robust Markov Decision Processes
Policy Gradient for Rectangular Robust Markov Decision Processes
Navdeep Kumar
E. Derman
M. Geist
Kfir Y. Levy
Shie Mannor
26
20
0
31 Jan 2023
On Pathologies in KL-Regularized Reinforcement Learning from Expert
  Demonstrations
On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations
Tim G. J. Rudner
Cong Lu
Michael A. Osborne
Yarin Gal
Yee Whye Teh
OffRL
38
27
0
28 Dec 2022
Cognitive Level-$k$ Meta-Learning for Safe and Pedestrian-Aware
  Autonomous Driving
Cognitive Level-kkk Meta-Learning for Safe and Pedestrian-Aware Autonomous Driving
Haozhe Lei
Quanyan Zhu
25
0
0
17 Dec 2022
Self-Adaptive Driving in Nonstationary Environments through Conjectural
  Online Lookahead Adaptation
Self-Adaptive Driving in Nonstationary Environments through Conjectural Online Lookahead Adaptation
Tao Li
Haozhe Lei
Quanyan Zhu
26
11
0
06 Oct 2022
MAN: Multi-Action Networks Learning
MAN: Multi-Action Networks Learning
Keqin Wang
Alison Bartsch
A. Farimani
21
3
0
19 Sep 2022
Variational Inference for Model-Free and Model-Based Reinforcement
  Learning
Variational Inference for Model-Free and Model-Based Reinforcement Learning
Felix Leibfried
OffRL
23
0
0
04 Sep 2022
Entropy Augmented Reinforcement Learning
Entropy Augmented Reinforcement Learning
Jianfei Ma
36
0
0
19 Aug 2022
Minimum Description Length Control
Minimum Description Length Control
Theodore H. Moskovitz
Ta-Chu Kao
M. Sahani
M. Botvinick
28
1
0
17 Jul 2022
q-Learning in Continuous Time
q-Learning in Continuous Time
Yanwei Jia
X. Zhou
OffRL
51
69
0
02 Jul 2022
Intra-agent speech permits zero-shot task acquisition
Intra-agent speech permits zero-shot task acquisition
Chen Yan
Federico Carnevale
Petko Georgiev
Adam Santoro
Aurelia Guy
Alistair Muldal
Chia-Chun Hung
Josh Abramson
Timothy Lillicrap
Greg Wayne
LM&Ro
41
9
0
07 Jun 2022
A Mixture-of-Expert Approach to RL-based Dialogue Management
A Mixture-of-Expert Approach to RL-based Dialogue Management
Yinlam Chow
Azamat Tulepbergenov
Ofir Nachum
Moonkyung Ryu
Mohammad Ghavamzadeh
Craig Boutilier
MoE
25
14
0
31 May 2022
Soft Actor-Critic with Inhibitory Networks for Faster Retraining
Soft Actor-Critic with Inhibitory Networks for Faster Retraining
J. Ide
Daria Mićović
Michael J. Guarino
K. Alcedo
D. Rosenbluth
Adrian P. Pope
13
3
0
07 Feb 2022
Maximum Entropy Population-Based Training for Zero-Shot Human-AI
  Coordination
Maximum Entropy Population-Based Training for Zero-Shot Human-AI Coordination
Rui Zhao
Jinming Song
Yufeng Yuan
Haifeng Hu
Yang Gao
Yi Wu
Zhongqian Sun
Yang Wei
32
63
0
22 Dec 2021
Policy Gradient and Actor-Critic Learning in Continuous Time and Space:
  Theory and Algorithms
Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms
Yanwei Jia
X. Zhou
OffRL
32
79
0
22 Nov 2021
Towards an Understanding of Default Policies in Multitask Policy
  Optimization
Towards an Understanding of Default Policies in Multitask Policy Optimization
Theodore H. Moskovitz
Michael Arbel
Jack Parker-Holder
Aldo Pacchiano
27
9
0
04 Nov 2021
Estimating Optimal Infinite Horizon Dynamic Treatment Regimes via
  pT-Learning
Estimating Optimal Infinite Horizon Dynamic Treatment Regimes via pT-Learning
Wenzhuo Zhou
Ruoqing Zhu
Annie Qu
40
22
0
20 Oct 2021
Improved Soft Actor-Critic: Mixing Prioritized Off-Policy Samples with
  On-Policy Experience
Improved Soft Actor-Critic: Mixing Prioritized Off-Policy Samples with On-Policy Experience
C. Banerjee
Zhiyong Chen
N. Noman
19
30
0
24 Sep 2021
Theoretical Guarantees of Fictitious Discount Algorithms for Episodic
  Reinforcement Learning and Global Convergence of Policy Gradient Methods
Theoretical Guarantees of Fictitious Discount Algorithms for Episodic Reinforcement Learning and Global Convergence of Policy Gradient Methods
Xin Guo
Anran Hu
Junzi Zhang
OffRL
31
6
0
13 Sep 2021
Greedification Operators for Policy Optimization: Investigating Forward
  and Reverse KL Divergences
Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences
Alan Chan
Hugo Silva
Sungsu Lim
Tadashi Kozuno
A. R. Mahmood
Martha White
25
29
0
17 Jul 2021
OptiDICE: Offline Policy Optimization via Stationary Distribution
  Correction Estimation
OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation
Jongmin Lee
Wonseok Jeon
Byung-Jun Lee
J. Pineau
Kee-Eung Kim
OffRL
37
91
0
21 Jun 2021
Characterizing the Gap Between Actor-Critic and Policy Gradient
Characterizing the Gap Between Actor-Critic and Policy Gradient
Junfeng Wen
Saurabh Kumar
Ramki Gummadi
Dale Schuurmans
34
15
0
13 Jun 2021
A New Formalism, Method and Open Issues for Zero-Shot Coordination
A New Formalism, Method and Open Issues for Zero-Shot Coordination
Johannes Treutlein
Michael Dennis
Caspar Oesterheld
Jakob N. Foerster
OffRL
29
35
0
11 Jun 2021
An Entropy Regularization Free Mechanism for Policy-based Reinforcement
  Learning
An Entropy Regularization Free Mechanism for Policy-based Reinforcement Learning
Changnan Xiao
Haosen Shi
Jiajun Fan
Shihong Deng
23
5
0
01 Jun 2021
Hierarchical Reinforcement Learning for Air-to-Air Combat
Hierarchical Reinforcement Learning for Air-to-Air Combat
Adrian P. Pope
J. Ide
Daria Mićović
Henry Diaz
D. Rosenbluth
Lee Ritholtz
Jason C. Twedt
Thayne T. Walker
K. Alcedo
D. Javorsek
19
72
0
03 May 2021
Improving Computational Efficiency in Visual Reinforcement Learning via
  Stored Embeddings
Improving Computational Efficiency in Visual Reinforcement Learning via Stored Embeddings
Lili Chen
Kimin Lee
A. Srinivas
Pieter Abbeel
OffRL
24
11
0
04 Mar 2021
Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality
Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality
Tengyu Xu
Zhuoran Yang
Zhaoran Wang
Yingbin Liang
OffRL
47
24
0
23 Feb 2021
A Survey on Deep Reinforcement Learning for Audio-Based Applications
A Survey on Deep Reinforcement Learning for Audio-Based Applications
S. Latif
Heriberto Cuayáhuitl
Farrukh Pervez
Fahad Shamshad
Hafiz Shehbaz Ali
Min Zhang
OffRL
60
73
0
01 Jan 2021
12
Next