Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1801.01290
Cited By
v1
v2 (latest)
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
4 January 2018
Tuomas Haarnoja
Aurick Zhou
Pieter Abbeel
Sergey Levine
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor"
50 / 4,130 papers shown
Title
VolleyBots: A Testbed for Multi-Drone Volleyball Game Combining Motion Control and Strategic Play
Zelai Xu
Chao Yu
Chao Yu
Huining Yuan
Xiangmin Yi
...
Wenhao Tang
Yu Wang
Wenbo Ding
Xiusi Chen
Yu Wang
334
0
0
04 Feb 2025
RAPID: Robust and Agile Planner Using Inverse Reinforcement Learning for Vision-Based Drone Navigation
Minwoo Kim
Geunsik Bae
Jinwoo Lee
Woojae Shin
Changseung Kim
Myong-Yol Choi
Heejung Shin
H. Oh
212
0
0
04 Feb 2025
Learning Fused State Representations for Control from Multi-View Observations
Zeyu Wang
Yao Li
Xin Li
Hongyu Zang
Romain Laroche
Riashat Islam
OffRL
177
1
0
03 Feb 2025
Preference VLM: Leveraging VLMs for Scalable Preference-Based Reinforcement Learning
Udita Ghosh
Dripta S. Raychaudhuri
Jiachen Li
Konstantinos Karydis
Amit K. Roy-Chowdhury
VLM
131
0
0
03 Feb 2025
Search-Based Adversarial Estimates for Improving Sample Efficiency in Off-Policy Reinforcement Learning
Federico Malato
Ville Hautamaki
76
1
0
03 Feb 2025
Dual Alignment Maximin Optimization for Offline Model-based RL
Chi Zhou
Wang Luo
Haoran Li
Congying Han
Tiande Guo
Zicheng Zhang
OffRL
185
0
0
02 Feb 2025
Learning from Suboptimal Data in Continuous Control via Auto-Regressive Soft Q-Network
Jijia Liu
Feng Gao
Q. Liao
Chao Yu
Yu Wang
OffRL
188
0
0
01 Feb 2025
Regularized Langevin Dynamics for Combinatorial Optimization
Shengyu Feng
Yiming Yang
161
1
0
01 Feb 2025
RLS3: RL-Based Synthetic Sample Selection to Enhance Spatial Reasoning in Vision-Language Models for Indoor Autonomous Perception
Joshua R. Waite
Md Zahid Hasan
Qisai Liu
Zhanhong Jiang
Chinmay Hegde
Soumik Sarkar
OffRL
SyDa
293
1
0
31 Jan 2025
Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization
Zishun Yu
Tengyu Xu
Di Jin
Karthik Abinav Sankararaman
Yun He
...
Eryk Helenowski
Chen Zhu
Sinong Wang
Hao Ma
Han Fang
LRM
242
11
0
29 Jan 2025
Langevin Soft Actor-Critic: Efficient Exploration through Uncertainty-Driven Critic Learning
Haque Ishfaq
Guangyuan Wang
Sami Nur Islam
Doina Precup
135
4
0
29 Jan 2025
Evidence on the Regularisation Properties of Maximum-Entropy Reinforcement Learning
Rémy Hosseinkhan Boucher
Onofrio Semeraro
L. Mathelin
135
0
0
28 Jan 2025
Low-altitude Friendly-Jamming for Satellite-Maritime Communications via Generative AI-enabled Deep Reinforcement Learning
Jiawei Huang
Aimin Wang
Geng Sun
Jiahui Li
Jiacheng Wang
Dusit Niyato
Victor C. M. Leung
117
0
0
28 Jan 2025
Towards General-Purpose Model-Free Reinforcement Learning
Scott Fujimoto
P. DÓro
Amy Zhang
Yuandong Tian
Michael Rabbat
OffRL
107
6
0
28 Jan 2025
Divergence-Augmented Policy Optimization
Qing Wang
Yingru Li
Jiechao Xiong
Tong Zhang
OffRL
174
16
0
28 Jan 2025
Reinforcement Teaching
Alex Lewandowski
Calarina Muslimani
Dale Schuurmans
Matthew E. Taylor
Jun Luo
205
2
0
28 Jan 2025
Multi-Agent Behavior Retrieval: Retrieval-Augmented Policy Training for Cooperative Push Manipulation by Mobile Robots
So Kuroki
Mai Nishimura
Tadashi Kozuno
148
1
0
28 Jan 2025
ABPT: Amended Backpropagation through Time with Partially Differentiable Rewards
Fanxing Li
Fangyu Sun
Tianbao Zhang
Danping Zou
95
0
0
24 Jan 2025
Utilizing Evolution Strategies to Train Transformers in Reinforcement Learning
Matyáš Lorenc
118
1
0
23 Jan 2025
State Combinatorial Generalization In Decision Making With Conditional Diffusion Models
Xintong Duan
Yutong He
Fahim Tajwar
Wen-Tse Chen
Ruslan Salakhutdinov
Jeff Schneider
OffRL
AI4CE
155
1
0
22 Jan 2025
An Offline Multi-Agent Reinforcement Learning Framework for Radio Resource Management
Eslam Eldeeb
Hirley Alves
OffRL
125
0
0
22 Jan 2025
On Generalization and Distributional Update for Mimicking Observations with Adequate Exploration
Yirui Zhou
Xiaowei Liu
Xiaofeng Zhang
Yangchun Zhang
121
0
0
22 Jan 2025
NBDI: A Simple and Effective Termination Condition for Skill Extraction from Task-Agnostic Demonstrations
Myunsoo Kim
Hayeong Lee
Seong-Woong Shim
JunHo Seo
Byung-Jun Lee
LLMAG
80
0
0
22 Jan 2025
Inverse Reinforcement Learning with Switching Rewards and History Dependency for Characterizing Animal Behaviors
Jingyang Ke
Feiyang Wu
Jiyi Wang
Jeffrey Markowitz
Anqi Wu
167
1
0
22 Jan 2025
Revisiting Ensemble Methods for Stock Trading and Crypto Trading Tasks at ACM ICAIF FinRL Contest 2023-2024
Nikolaus Holzer
Keyi Wang
Kairong Xiao
Xiao-Yang Liu Yanglet
AIFin
84
1
0
18 Jan 2025
Stability Enhancement in Reinforcement Learning via Adaptive Control Lyapunov Function
Donghe Chen
Han Wang
Lin Cheng
Lin Cheng
416
0
0
18 Jan 2025
Gameplay Filters: Robust Zero-Shot Safety through Adversarial Imagination
D. Nguyen
Kai-Chieh Hsu
Wenhao Yu
Jie Tan
J. F. Fisac
89
6
0
17 Jan 2025
Autonomous Algorithm for Training Autonomous Vehicles with Minimal Human Intervention
Sang-Hyun Lee
Daehyeok Kwon
Seung-Woo Seo
144
1
0
17 Jan 2025
Average-Reward Reinforcement Learning with Entropy Regularization
Jacob Adamczyk
Volodymyr Makarenko
Stas Tiomkin
R. Kulkarni
OOD
87
2
0
17 Jan 2025
TIMRL: A Novel Meta-Reinforcement Learning Framework for Non-Stationary and Multi-Task Environments
Chenyang Qi
Huiping Li
Panfeng Huang
OffRL
89
0
0
13 Jan 2025
Vid2Sim: Realistic and Interactive Simulation from Video for Urban Navigation
Ziyang Xie
Zhizheng Liu
Zhenghao Peng
Wayne Wu
Bolei Zhou
VGen
161
5
0
12 Jan 2025
Proposing Hierarchical Goal-Conditioned Policy Planning in Multi-Goal Reinforcement Learning
Gavin B. Rens
88
0
0
03 Jan 2025
DIPPER: Direct Preference Optimization to Accelerate Primitive-Enabled Hierarchical Reinforcement Learning
Utsav Singh
Souradip Chakraborty
Wesley A Suttle
Brian M. Sadler
Vinay P. Namboodiri
Amrit Singh Bedi
OffRL
130
0
0
03 Jan 2025
Heterogeneous Multi-agent Zero-Shot Coordination by Coevolution
Ke Xue
Yutong Wang
Cong Guan
Lei Yuan
Haobo Fu
Qiang Fu
Chao Qian
Yang Yu
164
18
0
03 Jan 2025
OMG-RL:Offline Model-based Guided Reward Learning for Heparin Treatment
Yooseok Lim
Sujee Lee
OffRL
235
0
0
03 Jan 2025
CREW: Facilitating Human-AI Teaming Research
Lingyu Zhang
Zhengran Ji
Boyuan Chen
140
4
0
03 Jan 2025
β
\beta
β
-DQN: Improving Deep Q-Learning By Evolving the Behavior
Hongming Zhang
Fengshuo Bai
Chenjun Xiao
Chao Gao
Bo Xu
Martin Müller
OffRL
94
3
0
03 Jan 2025
Exploiting Hybrid Policy in Reinforcement Learning for Interpretable Temporal Logic Manipulation
Hao Zhang
Hao Wang
Xiucai Huang
Wenrui Chen
Z. Kan
151
0
0
31 Dec 2024
Safe Bayesian Optimization for the Control of High-Dimensional Embodied Systems
Yunyue Wei
Zeji Yi
Hongda Li
Saraswati Soedarmadji
Yanan Sui
112
0
0
31 Dec 2024
Scalable Bayesian Optimization via Focalized Sparse Gaussian Processes
Yunyue Wei
Vincent Zhuang
Saraswati Soedarmadji
Yanan Sui
417
1
0
31 Dec 2024
Weber-Fechner Law in Temporal Difference learning derived from Control as Inference
Keiichiro Takahashi
Taisuke Kobayashi
Tomoya Yamanokuchi
Takamitsu Matsubara
76
0
0
31 Dec 2024
Attention-Enhanced Short-Time Wiener Solution for Acoustic Echo Cancellation
Fei Zhao
Xueliang Zhang
94
2
0
25 Dec 2024
Contrastive Representation for Interactive Recommendation
Jingyu Li
Zhiyong Feng
Dongxiao He
Hongqi Chen
Qinghang Gao
Guoli Wu
83
0
0
24 Dec 2024
Online Preference-based Reinforcement Learning with Self-augmented Feedback from Large Language Model
Songjun Tu
Jingbo Sun
Qichao Zhang
Xiangyuan Lan
Dongbin Zhao
144
4
0
22 Dec 2024
ACL-QL: Adaptive Conservative Level in Q-Learning for Offline Reinforcement Learning
Kun Wu
Yinuo Zhao
Zhihao Xu
Zhengping Che
Chengxiang Yin
C. Liu
Qinru Qiu
Feiferi Feng
OffRL
176
1
0
22 Dec 2024
Hierarchical Subspaces of Policies for Continual Offline Reinforcement Learning
Anthony Kobanda
Rémy Portelas
Odalric-Ambrym Maillard
Ludovic Denoyer
OffRL
CLL
180
1
0
19 Dec 2024
When Should We Prefer State-to-Visual DAgger Over Visual Reinforcement Learning?
Tongzhou Mu
Zhaoyang Li
Stanisław Wiktor Strzelecki
Xiu Yuan
Yunchao Yao
Litian Liang
H. Su
OffRL
129
2
0
18 Dec 2024
Policy Decorator: Model-Agnostic Online Refinement for Large Policy Model
Xiu Yuan
Tongzhou Mu
Stone Tao
Yunhao Fang
Mengke Zhang
H. Su
OffRL
144
8
0
18 Dec 2024
SMOSE: Sparse Mixture of Shallow Experts for Interpretable Reinforcement Learning in Continuous Control Tasks
Mátyás Vincze
Laura Ferrarotti
Leonardo Lucio Custode
Bruno Lepri
Giovanni Iacca
MoE
OffRL
136
1
0
17 Dec 2024
Design of Restricted Normalizing Flow towards Arbitrary Stochastic Policy with Computational Efficiency
Taisuke Kobayashi
Takumi Aotani
176
5
0
17 Dec 2024
Previous
1
2
3
...
7
8
9
...
81
82
83
Next