Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1801.01290
Cited By
v1
v2 (latest)
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
4 January 2018
Tuomas Haarnoja
Aurick Zhou
Pieter Abbeel
Sergey Levine
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor"
50 / 4,130 papers shown
Title
Hierarchical Preference Optimization: Learning to achieve goals via feasible subgoals prediction
Utsav Singh
Souradip Chakraborty
Wesley A Suttle
Brian M. Sadler
Anit Kumar Sahu
Mubarak Shah
Vinay P. Namboodiri
Amrit Singh Bedi
136
1
0
01 Nov 2024
IO Transformer: Evaluating SwinV2-Based Reward Models for Computer Vision
Maxwell Meyer
Jack Spruyt
ViT
43
0
0
31 Oct 2024
Reinforcement Learning Gradients as Vitamin for Online Finetuning Decision Transformers
Kai Yan
Alex Schwing
Yu-Xiong Wang
OffRL
OnRL
83
0
0
31 Oct 2024
Maximum Entropy Hindsight Experience Replay
Douglas C. Crowder
Matthew L. Trappett
Darrien M. McKenzie
Frances S. Chance
66
0
0
31 Oct 2024
Deterministic Exploration via Stationary Bellman Error Maximization
Sebastian Griesbach
Carlo DÉramo
52
0
0
31 Oct 2024
CALE: Continuous Arcade Learning Environment
Jesse Farebrother
Pablo Samuel Castro
ELM
68
0
0
31 Oct 2024
Learning for Deformable Linear Object Insertion Leveraging Flexibility Estimation from Visual Cues
Mingen Li
Changhyun Choi
74
0
0
30 Oct 2024
Stepping Out of the Shadows: Reinforcement Learning in Shadow Mode
Philipp Gassert
Matthias Althoff
62
0
0
30 Oct 2024
Offline Behavior Distillation
Shiye Lei
Sen Zhang
Dacheng Tao
OffRL
91
0
0
30 Oct 2024
NetworkGym: Reinforcement Learning Environments for Multi-Access Traffic Management in Network Simulation
Momin Haider
Ming Yin
Menglei Zhang
Arpit Gupta
Jing Zhu
Yu-Xiang Wang
OffRL
64
1
0
30 Oct 2024
A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks
Thomas Schmied
Thomas Adler
Vihang Patil
M. Beck
Korbinian Poppel
Johannes Brandstetter
Günter Klambauer
Razvan Pascanu
Sepp Hochreiter
207
7
0
29 Oct 2024
Environment as Policy: Learning to Race in Unseen Tracks
Hongze Wang
Jiaxu Xing
Nico Messikommer
Davide Scaramuzza
116
1
0
29 Oct 2024
Unveiling the Role of Expert Guidance: A Comparative Analysis of User-centered Imitation Learning and Traditional Reinforcement Learning
Amr Gomaa
Bilal Mahdy
65
2
0
28 Oct 2024
One-Step Diffusion Policy: Fast Visuomotor Policies via Diffusion Distillation
Zhendong Wang
Zhiyu Li
Ajay Mandlekar
Zhenjia Xu
Jiaojiao Fan
...
Yuke Zhu
Yogesh Balaji
Mingyuan Zhou
Xuan Li
Yu Zeng
125
18
0
28 Oct 2024
Q-Distribution guided Q-learning for offline reinforcement learning: Uncertainty penalized Q-value via consistency model
Jing Zhang
Linjiajie Fang
Kexin Shi
Wenjia Wang
Bing-Yi Jing
OffRL
174
0
0
27 Oct 2024
Beyond Simple Sum of Delayed Rewards: Non-Markovian Reward Modeling for Reinforcement Learning
Yuting Tang
Xin-Qiang Cai
Jing-Cheng Pang
Qiyu Wu
Yao-Xiang Ding
Masashi Sugiyama
OffRL
59
0
0
26 Oct 2024
Velocity-History-Based Soft Actor-Critic Tackling IROS'24 Competition "AI Olympics with RealAIGym"
Tim Lukas Faust
Habib Maraqten
Erfan Aghadavoodi
Boris Belousov
Jan Peters
33
1
0
26 Oct 2024
Off-Policy Selection for Initiating Human-Centric Experimental Design
Ge Gao
Xi Yang
Qitong Gao
Song Ju
Miroslav Pajic
Min Chi
OffRL
88
0
0
26 Oct 2024
OGBench: Benchmarking Offline Goal-Conditioned RL
Seohong Park
Kevin Frans
Benjamin Eysenbach
Sergey Levine
OffRL
154
29
0
26 Oct 2024
Offline Reinforcement Learning with OOD State Correction and OOD Action Suppression
Yixiu Mao
Qi Wang
Chen Chen
Yun Qu
Xiangyang Ji
OffRL
155
7
0
25 Oct 2024
PointPatchRL -- Masked Reconstruction Improves Reinforcement Learning on Point Clouds
B. Gyenes
Nikolai Franke
P. Becker
Gerhard Neumann
3DPC
98
0
0
24 Oct 2024
Learn 2 Rage: Experiencing The Emotional Roller Coaster That Is Reinforcement Learning
Lachlan Mares
Stefan Podgorski
Ian Reid
51
0
0
24 Oct 2024
Multimodal Information Bottleneck for Deep Reinforcement Learning with Multiple Sensors
Bang You
Huaping Liu
SSL
77
6
0
23 Oct 2024
Prioritized Generative Replay
Renhao Wang
Kevin Frans
Pieter Abbeel
Sergey Levine
Alexei A. Efros
OnRL
DiffM
198
7
0
23 Oct 2024
Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration
Max Wilcoxson
Qiyang Li
Kevin Frans
Sergey Levine
SSL
OffRL
OnRL
194
0
0
23 Oct 2024
Scalable spectral representations for multi-agent reinforcement learning in network MDPs
Tongzheng Ren
Runyu
Zhang
Bo Dai
101
0
0
22 Oct 2024
Guiding Reinforcement Learning with Incomplete System Dynamics
Shuyuan Wang
Jingliang Duan
Nathan P. Lawrence
Philip D. Loewen
M. Forbes
R. Bhushan Gopaluni
Lixian Zhang
113
1
0
22 Oct 2024
Magnetic Preference Optimization: Achieving Last-iterate Convergence for Language Model Alignment
Mingzhi Wang
Chengdong Ma
Qizhi Chen
Linjian Meng
Yang Han
Jiancong Xiao
Zhaowei Zhang
Jing Huo
Weijie Su
Yaodong Yang
156
9
0
22 Oct 2024
Uncovering RL Integration in SSL Loss: Objective-Specific Implications for Data-Efficient RL
Ömer Veysel Çağatan
Barış Akgün
OffRL
116
0
0
22 Oct 2024
Rethinking Soft Actor-Critic in High-Dimensional Action Spaces: The Cost of Ignoring Distribution Shift
Yanjun Chen
Wei Wei
Xianghui Wang
Zhiqiang Xu
Xiaoyu Shen
Wei Zhang
43
0
0
22 Oct 2024
Generalizing Motion Planners with Mixture of Experts for Autonomous Driving
Q. Sun
Huimin Wang
Jiahao Zhan
Fan Nie
Xin Wen
Leimeng Xu
Kun Zhan
Peng Jia
Xianpeng Lang
Hang Zhao
148
9
0
21 Oct 2024
In-Trajectory Inverse Reinforcement Learning: Learn Incrementally Before An Ongoing Trajectory Terminates
Shicheng Liu
Minghui Zhu
115
1
0
21 Oct 2024
Multimodal Policies with Physics-informed Representations
Haodong Feng
Tailin Wu
Yue Wang
Dixia Fan
PINN
AI4CE
76
0
0
20 Oct 2024
Optimizing Backward Policies in GFlowNets via Trajectory Likelihood Maximization
Timofei Gritsaev
Nikita Morozov
S. Samsonov
D. Tiapkin
85
3
0
20 Oct 2024
Augmented Lagrangian-Based Safe Reinforcement Learning Approach for Distribution System Volt/VAR Control
Guibin Chen
OffRL
47
0
0
19 Oct 2024
Action abstractions for amortized sampling
Oussama Boussif
Léna Néhale Ezzine
J. Viviano
Michał Koziarski
Moksh Jain
Nikolay Malkin
Emmanuel Bengio
Rim Assouel
Yoshua Bengio
98
0
0
19 Oct 2024
GUIDE: Real-Time Human-Shaped Agents
Lingyu Zhang
Zhengran Ji
Nicholas R Waytowich
Boyuan Chen
72
2
0
19 Oct 2024
Offline-to-online Reinforcement Learning for Image-based Grasping with Scarce Demonstrations
Bryan Chan
Anson Leung
James Bergstra
OffRL
OnRL
150
0
0
19 Oct 2024
GUIDEd Agents: Enhancing Navigation Policies through Task-Specific Uncertainty Abstraction in Localization-Limited Environments
Gokul Puthumanaillam
Paulo Padrao
Jose Fuentes
Leonardo Bobadilla
Melkior Ornik
87
1
0
19 Oct 2024
A Large Language Model-Driven Reward Design Framework via Dynamic Feedback for Reinforcement Learning
Shengjie Sun
Runze Liu
Jiafei Lyu
J. Yang
L. Zhang
Xiu Li
LRM
87
6
0
18 Oct 2024
Streaming Deep Reinforcement Learning Finally Works
Mohamed Elsayed
Gautham Vasan
A. R. Mahmood
OffRL
116
6
0
18 Oct 2024
Knowledge Transfer from Simple to Complex: A Safe and Efficient Reinforcement Learning Framework for Autonomous Driving Decision-Making
Rongliang Zhou
Jiakun Huang
Mingjun Li
Hepeng Li
Haotian Cao
Xiaolin Song
64
1
0
18 Oct 2024
TF-DDRL: A Transformer-enhanced Distributed DRL Technique for Scheduling IoT Applications in Edge and Cloud Computing Environments
Zhiyu Wang
M. Goudarzi
Rajkumar Buyya
OffRL
115
4
0
18 Oct 2024
Novelty-based Sample Reuse for Continuous Robotics Control
Ke Duan
Kai Yang
Houde Liu
Xueqian Wang
79
0
0
17 Oct 2024
Reward-free World Models for Online Imitation Learning
Shangzhe Li
Zhiao Huang
H. Su
OffRL
229
1
0
17 Oct 2024
Diffusing States and Matching Scores: A New Framework for Imitation Learning
Runzhe Wu
Yiding Chen
Gokul Swamy
Kianté Brantley
Wen Sun
DiffM
155
5
0
17 Oct 2024
Two-Timescale Linear Stochastic Approximation: Constant Stepsizes Go a Long Way
Jeongyeol Kwon
Luke Dotson
Yudong Chen
Qiaomin Xie
79
1
0
16 Oct 2024
Reinforcement Learning with Euclidean Data Augmentation for State-Based Continuous Control
Jinzhu Luo
Dingyang Chen
Qi Zhang
OffRL
78
0
0
16 Oct 2024
SAC-GLAM: Improving Online RL for LLM agents with Soft Actor-Critic and Hindsight Relabeling
Loris Gaven
Clément Romac
Thomas Carta
Sylvain Lamprier
Olivier Sigaud
Pierre-Yves Oudeyer
LLMAG
OffRL
49
3
0
16 Oct 2024
3D Gaussian Splatting in Robotics: A Survey
Siting Zhu
Guangming Wang
Dezhi Kong
Hesheng Wang
3DGS
112
14
0
16 Oct 2024
Previous
1
2
3
...
9
10
11
...
81
82
83
Next