Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1801.01290
Cited By
v1
v2 (latest)
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
4 January 2018
Tuomas Haarnoja
Aurick Zhou
Pieter Abbeel
Sergey Levine
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor"
50 / 4,128 papers shown
Title
Confidence-Guided Human-AI Collaboration: Reinforcement Learning with Distributional Proxy Value Propagation for Autonomous Driving
Li Zeqiao
Wang Yijing
Wang Haoyu
Li Zheng
Li Peng
Zuo zhiqiang
Hu Chuan
118
0
0
04 Jun 2025
Verification-Guided Falsification for Safe RL via Explainable Abstraction and Risk-Aware Exploration
Tuan Le
Risal Shahriar Shefin
Debashis Gupta
Thai Le
Sarra Alqahtani
OffRL
99
0
0
04 Jun 2025
FLIP: Flowability-Informed Powder Weighing
Nikola Radulov
Alex Wright
Thomas Little
Andrew I. Cooper
Gabriella Pizzuto
96
0
0
04 Jun 2025
An Efficient Task-Oriented Dialogue Policy: Evolutionary Reinforcement Learning Injected by Elite Individuals
Yangyang Zhao
Ben Niu
L. Qin
Shihan Wang
74
0
0
04 Jun 2025
Ensemble-MIX: Enhancing Sample Efficiency in Multi-Agent RL Using Ensemble Methods
Tom Danino
Nahum Shimkin
62
0
0
03 Jun 2025
Think Twice, Act Once: A Co-Evolution Framework of LLM and RL for Large-Scale Decision Making
Xu Wan
Wenyue Xu
Chao Yang
Mingyang Sun
54
1
0
03 Jun 2025
A Hybrid Approach to Indoor Social Navigation: Integrating Reactive Local Planning and Proactive Global Planning
Arnab Debnath
Gregory J. Stein
Jana Kosecka
52
0
0
03 Jun 2025
Trajectory First: A Curriculum for Discovering Diverse Policies
Cornelius V. Braun
Sayantan Auddy
Marc Toussaint
61
0
0
02 Jun 2025
Q-ARDNS-Multi: A Multi-Agent Quantum Reinforcement Learning Framework with Meta-Cognitive Adaptation for Complex 3D Environments
Umberto Gonçalves de Sousa
AI4CE
22
0
0
02 Jun 2025
Reinforcement Learning with Data Bootstrapping for Dynamic Subgoal Pursuit in Humanoid Robot Navigation
Chengyang Peng
Zhihao Zhang
Shiting Gong
Sankalp Agrawal
Keith A. Redmill
Ayonga Hereid
24
0
0
02 Jun 2025
Bidirectional Soft Actor-Critic: Leveraging Forward and Reverse KL Divergence for Efficient Reinforcement Learning
Yixian Zhang
Huaze Tang
Changxu Wei
Wenbo Ding
64
0
0
02 Jun 2025
MAGIK: Mapping to Analogous Goals via Imagination-enabled Knowledge Transfer
Ajsal Shereef Palattuparambil
Thommen George Karimpanal
Santu Rana
OffRL
60
0
0
02 Jun 2025
Efficient Manipulation-Enhanced Semantic Mapping With Uncertainty-Informed Action Selection
Nils Dengler
Jesper Mucke
Rohit Menon
Maren Bennewitz
34
0
0
02 Jun 2025
Optimistic critics can empower small actors
Olya Mastikhina
Dhruv Sreenivas
Pablo Samuel Castro
72
0
0
01 Jun 2025
A Reinforcement Learning Approach for RIS-aided Fair Communications
Alex Pierron
Michel Barbeau
L. D. Cicco
José Rubio-Hernán
Joaquin Garcia-Alfaro
31
0
0
01 Jun 2025
Action Dependency Graphs for Globally Optimal Coordinated Reinforcement Learning
Jianglin Ding
Jingcheng Tang
Gangshan Jing
36
0
0
01 Jun 2025
Local Manifold Approximation and Projection for Manifold-Aware Diffusion Planning
Kyowoon Lee
Jaesik Choi
DiffM
53
1
0
01 Jun 2025
Comparing Traditional and Reinforcement-Learning Methods for Energy Storage Control
Elinor Ginzburg
Itay Segev
Yoash Levron
Sarah Keren
OffRL
29
0
0
31 May 2025
Prompt-Tuned LLM-Augmented DRL for Dynamic O-RAN Network Slicing
Fatemeh Lotfi
Hossein Rajoli
Fatemeh Afghah
31
0
0
31 May 2025
Mitigating Plasticity Loss in Continual Reinforcement Learning by Reducing Churn
Hongyao Tang
J. Obando-Ceron
Pablo Samuel Castro
Aaron Courville
Glen Berseth
43
0
0
31 May 2025
BASIL: Best-Action Symbolic Interpretable Learning for Evolving Compact RL Policies
Kourosh Shahnazari
Seyed Moein Ayyoubzadeh
Mohammadali Keshtparvar
OffRL
57
0
0
31 May 2025
MOFGPT: Generative Design of Metal-Organic Frameworks using Language Models
Srivathsan Badrinarayanan
Rishikesh Magar
Akshay Antony
Radheesh Sharma Meda
Amir Barati Farimani
AI4CE
21
1
0
30 May 2025
Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer
Yilun Kong
Guozheng Ma
Qi Zhao
Haoyu Wang
Li Shen
Xueqian Wang
Dacheng Tao
MoE
OffRL
38
1
0
30 May 2025
Human sensory-musculoskeletal modeling and control of whole-body movements
Chenhui Zuo
Guohao Lin
Chen Zhang
Shanning Zhuang
Yanan Sui
20
0
0
29 May 2025
Enhanced DACER Algorithm with High Diffusion Efficiency
Yinuo Wang
Mining Tan
Wenjun Zou
Haotian Lin
Xujie Song
...
Guojian Zhan
Tianze Zhu
Shiqi Liu
Jingliang Duan
Shengbo Eben Li
DiffM
87
0
0
29 May 2025
Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners
Michal Nauman
Marek Cygan
Carmelo Sferrazza
Aviral Kumar
Pieter Abbeel
OffRL
100
0
0
29 May 2025
Normalizing Flows are Capable Models for RL
Raj Ghugare
Benjamin Eysenbach
OffRL
AI4CE
92
0
0
29 May 2025
Composite Flow Matching for Reinforcement Learning with Shifted-Dynamics Data
Lingkai Kong
Haichuan Wang
Tonghan Wang
Guojun Xiong
Milind Tambe
OffRL
56
0
0
29 May 2025
CURVE: CLIP-Utilized Reinforcement Learning for Visual Image Enhancement via Simple Image Processing
Yuka Ogino
Takahiro Toizumi
Atsushi Ito
CLIP
73
0
0
29 May 2025
Discriminative Policy Optimization for Token-Level Reward Models
Hongzhan Chen
Tao Yang
Shiping Gao
Ruijun Chen
Xiaojun Quan
Hongtao Tian
Ting Yao
44
0
0
29 May 2025
Contraction Actor-Critic: Contraction Metric-Guided Reinforcement Learning for Robust Path Tracking
Minjae Cho
Hiroyasu Tsukamoto
Huy Trong Tran
24
0
0
28 May 2025
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models
Ganqu Cui
Yuchen Zhang
Jiacheng Chen
Lifan Yuan
Zhi Wang
...
Lei Bai
Wanli Ouyang
Yu Cheng
Bowen Zhou
Ning Ding
LRM
90
5
0
28 May 2025
ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning
Tonghe Zhang
Chao Yu
Sichang Su
Yu Wang
104
0
0
28 May 2025
Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective
Yang Zhang
Xinran Li
Jianing Ye
Delin Qu
Shuang Qiu
Chongjie Zhang
Xiu Li
Chenjia Bai
52
0
0
27 May 2025
Token-level Accept or Reject: A Micro Alignment Approach for Large Language Models
Y. Zhang
Yu Yu
Bo Tang
Yu Zhu
Chuxiong Sun
...
Jie Hu
Zipeng Xie
Zhiyu Li
Feiyu Xiong
Edward Chung
106
0
0
26 May 2025
Situationally-Aware Dynamics Learning
Alejandro Murillo-Gonzalez
Lantao Liu
125
0
0
26 May 2025
Deep Actor-Critics with Tight Risk Certificates
Bahareh Tasdighi
Manuel Haussmann
Yi-Shan Wu
A. Masegosa
M. Kandemir
UQCV
95
0
0
26 May 2025
The challenge of hidden gifts in multi-agent reinforcement learning
Dane Malenfant
Blake A. Richards
46
0
0
26 May 2025
Decision Flow Policy Optimization
Jifeng Hu
Sili Huang
Siyuan Guo
Zhaogeng Liu
Li Shen
Lichao Sun
Hechang Chen
Yi-Ju Chang
Dacheng Tao
71
0
0
26 May 2025
Surrogate-Assisted Evolutionary Reinforcement Learning Based on Autoencoder and Hyperbolic Neural Network
Bingdong Li
Mei Jiang
Hong Qian
K. Tang
W. Hong
Peng Yang
146
0
0
26 May 2025
DISCOVER: Automated Curricula for Sparse-Reward Reinforcement Learning
Leander Diaz-Bone
Marco Bagatella
Jonas Hübotter
Andreas Krause
OffRL
92
0
0
26 May 2025
Learning to Trust Bellman Updates: Selective State-Adaptive Regularization for Offline RL
Qin-Wen Luo
Ming-Kun Xie
Ye-Wen Wang
Sheng-Jun Huang
OffRL
46
0
0
26 May 2025
Structured Reinforcement Learning for Combinatorial Decision-Making
Heiko Hoppe
Léo Baty
Louis Bouvier
Axel Parmentier
Maximilian Schiffer
OffRL
113
1
0
25 May 2025
Reduce Computational Cost In Deep Reinforcement Learning Via Randomized Policy Learning
Zhuochen Liu
Rahul Jain
Quan Nguyen
44
0
0
25 May 2025
Guided by Guardrails: Control Barrier Functions as Safety Instructors for Robotic Learning
Maeva Guerrier
Karthik Soma
Hassan Fouad
Giovanni Beltrame
83
0
0
24 May 2025
CiRL: Open-Source Environments for Reinforcement Learning in Circular Economy and Net Zero
Federico Zocco
Andrea Corti
Monica Malvezzi
AI4CE
35
0
0
24 May 2025
KL-regularization Itself is Differentially Private in Bandits and RLHF
Yizhou Zhang
Kishan Panaganti
Laixi Shi
Juba Ziani
Adam Wierman
52
0
0
23 May 2025
Learning Equilibria from Data: Provably Efficient Multi-Agent Imitation Learning
Till Freihaut
Luca Viano
Volkan Cevher
Matthieu Geist
Giorgia Ramponi
47
0
0
23 May 2025
H2-COMPACT: Human-Humanoid Co-Manipulation via Adaptive Contact Trajectory Policies
Geeta Chandra Raju Bethala
Hao Huang
Niraj Pudasaini
Abdullah Mohamed Ali
Shuaihang Yuan
Congcong Wen
Anthony Tzes
Yi Fang
133
0
0
23 May 2025
How Ensembles of Distilled Policies Improve Generalisation in Reinforcement Learning
Max Weltevrede
Moritz A. Zanger
M. Spaan
Wendelin Bohmer
OffRL
FedML
93
0
0
22 May 2025
Previous
1
2
3
4
5
...
81
82
83
Next