ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1801.01290
  4. Cited By
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement
  Learning with a Stochastic Actor
v1v2 (latest)

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

4 January 2018
Tuomas Haarnoja
Aurick Zhou
Pieter Abbeel
Sergey Levine
ArXiv (abs)PDFHTML

Papers citing "Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor"

50 / 4,130 papers shown
Title
Sample-Driven Federated Learning for Energy-Efficient and Real-Time IoT
  Sensing
Sample-Driven Federated Learning for Energy-Efficient and Real-Time IoT Sensing
Minh Ngoc Luu
Minh-Duong Nguyen
E. Bedeer
Van Duc Nguyen
D. Hoang
Diep N. Nguyen
Quoc-Viet Pham
67
3
0
11 Oct 2023
Diversity for Contingency: Learning Diverse Behaviors for Efficient
  Adaptation and Transfer
Diversity for Contingency: Learning Diverse Behaviors for Efficient Adaptation and Transfer
Finn Rietz
J. A. Stork
49
0
0
11 Oct 2023
COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically
  for Model-Based RL
COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically for Model-Based RL
Xiyao Wang
Ruijie Zheng
Yanchao Sun
Ruonan Jia
Wichayaporn Wongkamjan
Huazhe Xu
Furong Huang
OffRL
121
13
0
11 Oct 2023
Bridging the Gap between Newton-Raphson Method and Regularized Policy
  Iteration
Bridging the Gap between Newton-Raphson Method and Regularized Policy Iteration
Zeyang Li
Chuxiong Hu
Yunan Wang
Guojian Zhan
Jie Li
Shengbo Eben Li
74
0
0
11 Oct 2023
Robust Safe Reinforcement Learning under Adversarial Disturbances
Robust Safe Reinforcement Learning under Adversarial Disturbances
Zeyang Li
Chuxiong Hu
Shengbo Eben Li
Jia Cheng
Yunan Wang
AAML
70
4
0
11 Oct 2023
Reinforcement Learning in a Safety-Embedded MDP with Trajectory
  Optimization
Reinforcement Learning in a Safety-Embedded MDP with Trajectory Optimization
Fan Yang
Wen-Min Zhou
Zuxin Liu
Ding Zhao
David Held
65
1
0
10 Oct 2023
$f$-Policy Gradients: A General Framework for Goal Conditioned RL using
  $f$-Divergences
fff-Policy Gradients: A General Framework for Goal Conditioned RL using fff-Divergences
Siddhant Agarwal
Ishan Durugkar
Peter Stone
Amy Zhang
75
8
0
10 Oct 2023
Boosting Continuous Control with Consistency Policy
Boosting Continuous Control with Consistency Policy
Yuhui Chen
Haoran Li
Dongbin Zhao
OffRL
93
27
0
10 Oct 2023
Human-Robot Gym: Benchmarking Reinforcement Learning in Human-Robot
  Collaboration
Human-Robot Gym: Benchmarking Reinforcement Learning in Human-Robot Collaboration
Jakob Thumm
Felix Trost
Matthias Althoff
OffRL
96
6
0
09 Oct 2023
Factual and Personalized Recommendations using Language Models and
  Reinforcement Learning
Factual and Personalized Recommendations using Language Models and Reinforcement Learning
Jihwan Jeong
Yinlam Chow
Guy Tennenholtz
Chih-Wei Hsu
Azamat Tulepbergenov
Mohammad Ghavamzadeh
Craig Boutilier
88
4
0
09 Oct 2023
Planning to Go Out-of-Distribution in Offline-to-Online Reinforcement
  Learning
Planning to Go Out-of-Distribution in Offline-to-Online Reinforcement Learning
Trevor A. McInroe
Adam Jelley
Stefano V. Albrecht
Amos Storkey
OffRLOnRL
80
6
0
09 Oct 2023
Imitator Learning: Achieve Out-of-the-Box Imitation Ability in Variable
  Environments
Imitator Learning: Achieve Out-of-the-Box Imitation Ability in Variable Environments
Xiong-Hui Chen
Junyin Ye
Hang Zhao
Yi-Chen Li
Haoran Shi
...
Si-Hang Yang
Anqi Huang
Kai Xu
Zongzhang Zhang
Yang Yu
78
0
0
09 Oct 2023
DiffCPS: Diffusion Model based Constrained Policy Search for Offline
  Reinforcement Learning
DiffCPS: Diffusion Model based Constrained Policy Search for Offline Reinforcement Learning
Longxiang He
Li Shen
Linrui Zhang
Junbo Tan
Xueqian Wang
OffRL
98
12
0
09 Oct 2023
Increasing Entropy to Boost Policy Gradient Performance on
  Personalization Tasks
Increasing Entropy to Boost Policy Gradient Performance on Personalization Tasks
Andrew Starnes
Anton Dereventsov
Clayton Webster
64
0
0
09 Oct 2023
Distributional Soft Actor-Critic with Three Refinements
Distributional Soft Actor-Critic with Three Refinements
Jingliang Duan
Wenxuan Wang
Liming Xiao
Jiaxin Gao
Shengbo Eben Li
Chang Liu
Ya-Qin Zhang
Bo Cheng
Keqiang Li
OODDOffRL
84
3
0
09 Oct 2023
Intelligent DRL-Based Adaptive Region of Interest for Delay-sensitive
  Telemedicine Applications
Intelligent DRL-Based Adaptive Region of Interest for Delay-sensitive Telemedicine Applications
Abdulrahman Soliman
Amr M. Mohamed
Elias Yaacoub
Nikhil V. Navkar
A. Erbad
166
2
0
08 Oct 2023
Learning Generalizable Agents via Saliency-Guided Features Decorrelation
Learning Generalizable Agents via Saliency-Guided Features Decorrelation
Sili Huang
Yanchao Sun
Jifeng Hu
Siyuan Guo
Hechang Chen
Yi-Ju Chang
Lichao Sun
Bo Yang
82
6
0
08 Oct 2023
Improving Offline-to-Online Reinforcement Learning with Q Conditioned
  State Entropy Exploration
Improving Offline-to-Online Reinforcement Learning with Q Conditioned State Entropy Exploration
Ziqi Zhang
Xiao Xiong
Zifeng Zhuang
Jinxin Liu
Donglin Wang
OffRLOnRL
115
0
0
07 Oct 2023
Understanding, Predicting and Better Resolving Q-Value Divergence in
  Offline-RL
Understanding, Predicting and Better Resolving Q-Value Divergence in Offline-RL
Yang Yue
Rui Lu
Bingyi Kang
Shiji Song
Gao Huang
OffRL
124
17
0
06 Oct 2023
Confronting Reward Model Overoptimization with Constrained RLHF
Confronting Reward Model Overoptimization with Constrained RLHF
Ted Moskovitz
Aaditya K. Singh
DJ Strouse
Tuomas Sandholm
Ruslan Salakhutdinov
Anca D. Dragan
Stephen Marcus McAleer
103
55
0
06 Oct 2023
Improving Reinforcement Learning Efficiency with Auxiliary Tasks in
  Non-Visual Environments: A Comparison
Improving Reinforcement Learning Efficiency with Auxiliary Tasks in Non-Visual Environments: A Comparison
Moritz Lange
Noah Krystiniak
Raphael C. Engelhardt
Wolfgang Konen
Laurenz Wiskott
OffRL
61
1
0
06 Oct 2023
Reinforcement Learning with Fast and Forgetful Memory
Reinforcement Learning with Fast and Forgetful Memory
Steven D. Morad
Ryan Kortvelesy
Stephan Liwicki
Amanda Prorok
OffRL
57
4
0
06 Oct 2023
Demystifying Embedding Spaces using Large Language Models
Demystifying Embedding Spaces using Large Language Models
Guy Tennenholtz
Yinlam Chow
Chih-Wei Hsu
Jihwan Jeong
Lior Shani
Azamat Tulepbergenov
Deepak Ramachandran
Martin Mladenov
Craig Boutilier
57
15
0
06 Oct 2023
AURO: Reinforcement Learning for Adaptive User Retention Optimization in Recommender Systems
AURO: Reinforcement Learning for Adaptive User Retention Optimization in Recommender Systems
Zhenghai Xue
Qingpeng Cai
Tianyou Zuo
Bin Yang
Lantao Hu
Peng Jiang
Kun Gai
68
3
0
06 Oct 2023
RTDK-BO: High Dimensional Bayesian Optimization with Reinforced
  Transformer Deep kernels
RTDK-BO: High Dimensional Bayesian Optimization with Reinforced Transformer Deep kernels
Alexander Shmakov
Avisek Naug
Vineet Gundecha
Sahand Ghorbanpour
Ricardo Luna Gutierrez
Ashwin Ramesh Babu
Antonio Guillen-Perez
Soumyendu Sarkar
103
11
0
05 Oct 2023
V2X Cooperative Perception for Autonomous Driving: Recent Advances and
  Challenges
V2X Cooperative Perception for Autonomous Driving: Recent Advances and Challenges
Wei Chen
Tao Huang
Xi Zhou
Dinh C. Nguyen
M. R. Azghadi
Yuxuan Xia
Qing-Long Han
Sumei Sun
135
41
0
05 Oct 2023
LESSON: Learning to Integrate Exploration Strategies for Reinforcement
  Learning via an Option Framework
LESSON: Learning to Integrate Exploration Strategies for Reinforcement Learning via an Option Framework
Woojun Kim
Jeonghye Kim
Young-Jin Sung
89
5
0
05 Oct 2023
A Two-stage Based Social Preference Recognition in Multi-Agent
  Autonomous Driving System
A Two-stage Based Social Preference Recognition in Multi-Agent Autonomous Driving System
Jintao Xue
Dongkun Zhang
Rong Xiong
Yue Wang
Eryun Liu
79
0
0
05 Oct 2023
${\tt MORALS}$: Analysis of High-Dimensional Robot Controllers via
  Topological Tools in a Latent Space
MORALS{\tt MORALS}MORALS: Analysis of High-Dimensional Robot Controllers via Topological Tools in a Latent Space
Ewerton R. Vieira
Aravind Sivaramakrishnan
Sumanth Tangirala
Edgar Granados
Konstantin Mischaikow
Kostas E. Bekris
86
3
0
05 Oct 2023
Roadmaps with Gaps over Controllers: Achieving Efficiency in Planning
  under Dynamics
Roadmaps with Gaps over Controllers: Achieving Efficiency in Planning under Dynamics
Aravind Sivaramakrishnan
Sumanth Tangirala
Edgar Granados
Noah R. Carver
Kostas E. Bekris
70
3
0
05 Oct 2023
$\mathcal{B}$-Coder: Value-Based Deep Reinforcement Learning for Program
  Synthesis
B\mathcal{B}B-Coder: Value-Based Deep Reinforcement Learning for Program Synthesis
Zishun Yu
Yunzhe Tao
Liyu Chen
Tao Sun
Hongxia Yang
90
13
0
04 Oct 2023
Searching for High-Value Molecules Using Reinforcement Learning and
  Transformers
Searching for High-Value Molecules Using Reinforcement Learning and Transformers
Raj Ghugare
Santiago Miret
Adriana Hugessen
Mariano Phielipp
Glen Berseth
102
17
0
04 Oct 2023
Learning to Scale Logits for Temperature-Conditional GFlowNets
Learning to Scale Logits for Temperature-Conditional GFlowNets
Minsu Kim
Joohwan Ko
Taeyoung Yun
Dinghuai Zhang
Ling Pan
W. Kim
Jinkyoo Park
Emmanuel Bengio
Yoshua Bengio
AI4CE
130
25
0
04 Oct 2023
Expected flow networks in stochastic environments and two-player
  zero-sum games
Expected flow networks in stochastic environments and two-player zero-sum games
Marco Jiralerspong
Bilun Sun
Danilo Vucetic
Tianyu Zhang
Yoshua Bengio
Gauthier Gidel
Nikolay Malkin
103
7
0
04 Oct 2023
Local Search GFlowNets
Local Search GFlowNets
Minsu Kim
Taeyoung Yun
Emmanuel Bengio
Dinghuai Zhang
Yoshua Bengio
SungSoo Ahn
Jinkyoo Park
112
40
0
04 Oct 2023
Multi-Agent Reinforcement Learning for Power Grid Topology Optimization
Multi-Agent Reinforcement Learning for Power Grid Topology Optimization
E. V. D. Sar
Alessandro Zocca
Sandjai Bhulai
119
0
0
04 Oct 2023
LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving
LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving
Hao Sha
Yao Mu
Yuxuan Jiang
Li Chen
Chenfeng Xu
Ping Luo
Shengbo Eben Li
Masayoshi Tomizuka
Wei Zhan
Mingyu Ding
265
179
0
04 Oct 2023
A Fisher-Rao gradient flow for entropy-regularised Markov decision processes in Polish spaces
A Fisher-Rao gradient flow for entropy-regularised Markov decision processes in Polish spaces
B. Kerimkulov
J. Leahy
David Siska
Lukasz Szpruch
Yufei Zhang
128
12
0
04 Oct 2023
Prioritized Soft Q-Decomposition for Lexicographic Reinforcement
  Learning
Prioritized Soft Q-Decomposition for Lexicographic Reinforcement Learning
Finn Rietz
Erik Schaffernicht
Stefan Heinrich
J. A. Stork
81
1
0
03 Oct 2023
Towards a Unified Framework for Sequential Decision Making
Towards a Unified Framework for Sequential Decision Making
Carlos Núnez-Molina
Pablo Mesejo
Juan Fernández-Olivares
19
0
0
03 Oct 2023
AlignDiff: Aligning Diverse Human Preferences via Behavior-Customisable
  Diffusion Model
AlignDiff: Aligning Diverse Human Preferences via Behavior-Customisable Diffusion Model
Zibin Dong
Yifu Yuan
Jianye Hao
Fei Ni
Yao Mu
Yan Zheng
Yujing Hu
Tangjie Lv
Changjie Fan
Zhipeng Hu
105
32
0
03 Oct 2023
Blending Imitation and Reinforcement Learning for Robust Policy
  Improvement
Blending Imitation and Reinforcement Learning for Robust Policy Improvement
Xuefeng Liu
Takuma Yoneda
Rick L. Stevens
Matthew R. Walter
Yuxin Chen
98
11
0
03 Oct 2023
On Representation Complexity of Model-based and Model-free Reinforcement
  Learning
On Representation Complexity of Model-based and Model-free Reinforcement Learning
Hanlin Zhu
Baihe Huang
Stuart Russell
OffRL
84
4
0
03 Oct 2023
Imitation Learning from Observation through Optimal Transport
Imitation Learning from Observation through Optimal Transport
Wei-Di Chang
Scott Fujimoto
David Meger
Gregory Dudek
68
4
0
02 Oct 2023
Toward Scalable Visual Servoing Using Deep Reinforcement Learning and
  Optimal Control
Toward Scalable Visual Servoing Using Deep Reinforcement Learning and Optimal Control
Salar Asayesh
Hossein Sheikhi Darani
Mo chen
M. Mehrandezh
Kamal Gupta
50
1
0
02 Oct 2023
Drug Discovery with Dynamic Goal-aware Fragments
Drug Discovery with Dynamic Goal-aware Fragments
Seul Lee
Seanie Lee
Kenji Kawaguchi
Sung Ju Hwang
125
9
0
02 Oct 2023
A General Offline Reinforcement Learning Framework for Interactive
  Recommendation
A General Offline Reinforcement Learning Framework for Interactive Recommendation
Teng Xiao
Donglin Wang
OffRL
115
74
0
01 Oct 2023
LANCAR: Leveraging Language for Context-Aware Robot Locomotion in
  Unstructured Environments
LANCAR: Leveraging Language for Context-Aware Robot Locomotion in Unstructured Environments
Chak Lam Shek
Xiyang Wu
Wesley A Suttle
Carl E. Busart
Erin Zaroukian
Dinesh Manocha
Pratap Tokekar
Amrit Singh Bedi
LLMAG
140
10
0
30 Sep 2023
Consistent Aggregation of Objectives with Diverse Time Preferences
  Requires Non-Markovian Rewards
Consistent Aggregation of Objectives with Diverse Time Preferences Requires Non-Markovian Rewards
Silviu Pitis
70
6
0
30 Sep 2023
Order-Preserving GFlowNets
Order-Preserving GFlowNets
Yihang Chen
Lukas Mauch
138
12
0
30 Sep 2023
Previous
123...272829...818283
Next