ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1801.01290
  4. Cited By
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement
  Learning with a Stochastic Actor
v1v2 (latest)

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

4 January 2018
Tuomas Haarnoja
Aurick Zhou
Pieter Abbeel
Sergey Levine
ArXiv (abs)PDFHTML

Papers citing "Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor"

50 / 4,130 papers shown
Title
Value Improved Actor Critic Algorithms
Value Improved Actor Critic Algorithms
Yaniv Oren
Moritz A. Zanger
Pascal R. van der Vaart
M. Spaan
Wendelin Bohmer
Wendelin Bohmer
OffRL
91
0
0
03 Jun 2024
Looking Backward: Retrospective Backward Synthesis for Goal-Conditioned GFlowNets
Looking Backward: Retrospective Backward Synthesis for Goal-Conditioned GFlowNets
Haoran He
C. Chang
Huazhe Xu
Ling Pan
217
7
0
03 Jun 2024
Shared-unique Features and Task-aware Prioritized Sampling on Multi-task
  Reinforcement Learning
Shared-unique Features and Task-aware Prioritized Sampling on Multi-task Reinforcement Learning
Po-Shao Lin
Jia-Fong Yeh
Yi-Ting Chen
Winston H. Hsu
87
0
0
02 Jun 2024
Learning Multimodal Behaviors from Scratch with Diffusion Policy
  Gradient
Learning Multimodal Behaviors from Scratch with Diffusion Policy Gradient
Zechu Li
Rickmer Krohn
Tao Chen
Anurag Ajay
Pulkit Agrawal
Georgia Chalvatzaki
DiffM
139
18
0
02 Jun 2024
FuRL: Visual-Language Models as Fuzzy Rewards for Reinforcement Learning
FuRL: Visual-Language Models as Fuzzy Rewards for Reinforcement Learning
Yuwei Fu
Haichao Zhang
Di Wu
Wei Xu
Benoit Boulet
VLM
125
15
0
02 Jun 2024
Improving GFlowNets for Text-to-Image Diffusion Alignment
Improving GFlowNets for Text-to-Image Diffusion Alignment
Dinghuai Zhang
Yizhe Zhang
Jiatao Gu
Ruixiang Zhang
J. Susskind
Navdeep Jaitly
Shuangfei Zhai
EGVM
144
10
0
02 Jun 2024
Exploring the limits of Hierarchical World Models in Reinforcement
  Learning
Exploring the limits of Hierarchical World Models in Reinforcement Learning
Robin Schiewer
Anand Subramoney
Laurenz Wiskott
93
1
0
01 Jun 2024
Do's and Don'ts: Learning Desirable Skills with Instruction Videos
Do's and Don'ts: Learning Desirable Skills with Instruction Videos
Hyunseung Kim
ByungKun Lee
Hojoon Lee
Dongyoon Hwang
Donghu Kim
Jaegul Choo
151
1
0
01 Jun 2024
Self-Augmented Preference Optimization: Off-Policy Paradigms for
  Language Model Alignment
Self-Augmented Preference Optimization: Off-Policy Paradigms for Language Model Alignment
Yueqin Yin
Zhendong Wang
Yujia Xie
Weizhu Chen
Mingyuan Zhou
101
4
0
31 May 2024
HOPE: A Reinforcement Learning-based Hybrid Policy Path Planner for Diverse Parking Scenarios
HOPE: A Reinforcement Learning-based Hybrid Policy Path Planner for Diverse Parking Scenarios
Mingyang Jiang
Yueyuan Li
Songan Zhang
Siyuan Chen
Chunxiang Wang
Ming Yang
157
5
0
31 May 2024
Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement
  Learning
Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement Learning
Davide Corsi
Davide Camponogara
Alessandro Farinelli
OffRL
77
2
0
30 May 2024
Video-Language Critic: Transferable Reward Functions for
  Language-Conditioned Robotics
Video-Language Critic: Transferable Reward Functions for Language-Conditioned Robotics
Minttu Alakuijala
Reginald McLean
Isaac Woungang
Nariman Farsad
Samuel Kaski
Pekka Marttinen
Kai Yuan
LM&Ro
79
1
0
30 May 2024
Adaptive Advantage-Guided Policy Regularization for Offline
  Reinforcement Learning
Adaptive Advantage-Guided Policy Regularization for Offline Reinforcement Learning
Tenglong Liu
Yang Li
Yixing Lan
Hao Gao
Wei Pan
Xin Xu
OffRL
118
8
0
30 May 2024
Learning from Random Demonstrations: Offline Reinforcement Learning with
  Importance-Sampled Diffusion Models
Learning from Random Demonstrations: Offline Reinforcement Learning with Importance-Sampled Diffusion Models
Zeyu Fang
Tian Lan
OffRL
120
2
0
30 May 2024
May the Dance be with You: Dance Generation Framework for Non-Humanoids
May the Dance be with You: Dance Generation Framework for Non-Humanoids
Hyemin Ahn
DiffMVGen
101
1
0
30 May 2024
Bilevel reinforcement learning via the development of hyper-gradient without lower-level convexity
Bilevel reinforcement learning via the development of hyper-gradient without lower-level convexity
Yan Yang
Bin Gao
Ya-xiang Yuan
144
2
0
30 May 2024
OMPO: A Unified Framework for RL under Policy and Dynamics Shifts
OMPO: A Unified Framework for RL under Policy and Dynamics Shifts
Yu-Juan Luo
Tianying Ji
Gang Hua
Jianwei Zhang
Huazhe Xu
Xianyuan Zhan
OffRL
114
3
0
29 May 2024
Trust the Model Where It Trusts Itself -- Model-Based Actor-Critic with
  Uncertainty-Aware Rollout Adaption
Trust the Model Where It Trusts Itself -- Model-Based Actor-Critic with Uncertainty-Aware Rollout Adaption
Bernd Frauenknecht
Artur Eisele
Devdutt Subhasish
Friedrich Solowjow
Sebastian Trimpe
116
5
0
29 May 2024
Spectral-Risk Safe Reinforcement Learning with Convergence Guarantees
Spectral-Risk Safe Reinforcement Learning with Convergence Guarantees
Dohyeong Kim
Taehyun Cho
Seung Han
Hojun Chung
Kyungjae Lee
Songhwai Oh
82
1
0
29 May 2024
Efficient Preference-based Reinforcement Learning via Aligned Experience
  Estimation
Efficient Preference-based Reinforcement Learning via Aligned Experience Estimation
Fengshuo Bai
Rui Zhao
Hongming Zhang
Sijia Cui
Ying Wen
Yaodong Yang
Bo Xu
Lei Han
OffRL
95
8
0
29 May 2024
RLeXplore: Accelerating Research in Intrinsically-Motivated Reinforcement Learning
RLeXplore: Accelerating Research in Intrinsically-Motivated Reinforcement Learning
Mingqi Yuan
Roger Creus Castanyer
Bo Li
Xin Jin
Glen Berseth
Wenjun Zeng
180
0
0
29 May 2024
Model-Based Diffusion for Trajectory Optimization
Model-Based Diffusion for Trajectory Optimization
Chaoyi Pan
Zeji Yi
Guanya Shi
Guannan Qu
101
13
0
28 May 2024
DTR-Bench: An in silico Environment and Benchmark Platform for
  Reinforcement Learning Based Dynamic Treatment Regime
DTR-Bench: An in silico Environment and Benchmark Platform for Reinforcement Learning Based Dynamic Treatment Regime
Zhiyao Luo
Mingcheng Zhu
Fenglin Liu
Jiali Li
Yangchen Pan
Jiandong Zhou
Tingting Zhu
OffRL
75
3
0
28 May 2024
Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical
  Behaviors in Deep Off-Policy RL
Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RL
Yu-Juan Luo
Tianying Ji
Gang Hua
Jianwei Zhang
Huazhe Xu
Xianyuan Zhan
OffRLOnRL
119
3
0
28 May 2024
AlignIQL: Policy Alignment in Implicit Q-Learning through Constrained
  Optimization
AlignIQL: Policy Alignment in Implicit Q-Learning through Constrained Optimization
Longxiang He
Li Shen
Junbo Tan
Xueqian Wang
113
4
0
28 May 2024
HarmoDT: Harmony Multi-Task Decision Transformer for Offline
  Reinforcement Learning
HarmoDT: Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning
Shengchao Hu
Ziqing Fan
Li Shen
Ya Zhang
Yanfeng Wang
Dacheng Tao
OffRL
97
11
0
28 May 2024
Mollification Effects of Policy Gradient Methods
Mollification Effects of Policy Gradient Methods
Tao Wang
Sylvia Herbert
Sicun Gao
104
1
0
28 May 2024
A Pontryagin Perspective on Reinforcement Learning
A Pontryagin Perspective on Reinforcement Learning
Onno Eberhard
Claire Vernade
Michael Muehlebach
140
3
0
28 May 2024
No $D_{\text{train}}$: Model-Agnostic Counterfactual Explanations Using Reinforcement Learning
No DtrainD_{\text{train}}Dtrain​: Model-Agnostic Counterfactual Explanations Using Reinforcement Learning
Xiangyu Sun
Raquel Aoki
Kevin H. Wilson
81
1
0
28 May 2024
A Recipe for Unbounded Data Augmentation in Visual Reinforcement
  Learning
A Recipe for Unbounded Data Augmentation in Visual Reinforcement Learning
Abdulaziz Almuzairee
Nicklas Hansen
Henrik I. Christensen
81
7
0
27 May 2024
Rethinking Transformers in Solving POMDPs
Rethinking Transformers in Solving POMDPs
Chenhao Lu
Ruizhe Shi
Yuyao Liu
Kaizhe Hu
Simon S. Du
Huazhe Xu
AI4CE
136
3
0
27 May 2024
Position: Foundation Agents as the Paradigm Shift for Decision Making
Position: Foundation Agents as the Paradigm Shift for Decision Making
Xiaoqian Liu
Xingzhou Lou
Jianbin Jiao
Junge Zhang
OffRLLLMAG
105
7
0
27 May 2024
Knowing What Not to Do: Leverage Language Model Insights for Action
  Space Pruning in Multi-agent Reinforcement Learning
Knowing What Not to Do: Leverage Language Model Insights for Action Space Pruning in Multi-agent Reinforcement Learning
Zhihao Liu
Xianliang Yang
Zichuan Liu
Yifan Xia
Wei Jiang
Yuanyu Zhang
Lijuan Li
Guoliang Fan
Lei Song
Bian Jiang
LLMAG
84
3
0
27 May 2024
Oracle-Efficient Reinforcement Learning for Max Value Ensembles
Oracle-Efficient Reinforcement Learning for Max Value Ensembles
Marcel Hussing
Michael Kearns
Aaron Roth
S. B. Sengupta
Jessica Sorrell
73
0
0
27 May 2024
Symmetric Reinforcement Learning Loss for Robust Learning on Diverse Tasks and Model Scales
Symmetric Reinforcement Learning Loss for Robust Learning on Diverse Tasks and Model Scales
Ju-Seung Byun
Andrew Perrault
57
1
0
27 May 2024
Amortized Active Causal Induction with Deep Reinforcement Learning
Amortized Active Causal Induction with Deep Reinforcement Learning
Yashas Annadani
P. Tigas
Stefan Bauer
Adam Foster
88
1
0
26 May 2024
Provably Efficient Off-Policy Adversarial Imitation Learning with
  Convergence Guarantees
Provably Efficient Off-Policy Adversarial Imitation Learning with Convergence Guarantees
Yilei Chen
Vittorio Giammarino
James Queeney
I. Paschalidis
70
0
0
26 May 2024
Fast TRAC: A Parameter-Free Optimizer for Lifelong Reinforcement
  Learning
Fast TRAC: A Parameter-Free Optimizer for Lifelong Reinforcement Learning
Aneesh Muppidi
Zhiyu Zhang
Heng Yang
78
6
0
26 May 2024
On the Algorithmic Bias of Aligning Large Language Models with RLHF:
  Preference Collapse and Matching Regularization
On the Algorithmic Bias of Aligning Large Language Models with RLHF: Preference Collapse and Matching Regularization
Jiancong Xiao
Ziniu Li
Xingyu Xie
E. Getzen
Cong Fang
Qi Long
Weijie J. Su
108
23
0
26 May 2024
RoboArm-NMP: a Learning Environment for Neural Motion Planning
RoboArm-NMP: a Learning Environment for Neural Motion Planning
Tom Jurgenson
Matan Sudry
Gal Avineri
Aviv Tamar
72
0
0
25 May 2024
Safe Deep Model-Based Reinforcement Learning with Lyapunov Functions
Safe Deep Model-Based Reinforcement Learning with Lyapunov Functions
Harry Zhang
79
0
0
25 May 2024
Diffusion-based Reinforcement Learning via Q-weighted Variational Policy
  Optimization
Diffusion-based Reinforcement Learning via Q-weighted Variational Policy Optimization
Shutong Ding
Ke Hu
Zhenhao Zhang
Kan Ren
Weinan Zhang
Jingyi Yu
Jingya Wang
Ye-ling Shi
115
21
0
25 May 2024
Pausing Policy Learning in Non-stationary Reinforcement Learning
Pausing Policy Learning in Non-stationary Reinforcement Learning
Hyunin Lee
Ming Jin
Javad Lavaei
Somayeh Sojoudi
OffRL
117
2
0
25 May 2024
Constrained Ensemble Exploration for Unsupervised Skill Discovery
Constrained Ensemble Exploration for Unsupervised Skill Discovery
Chenjia Bai
Rushuai Yang
Qiaosheng Zhang
Kang Xu
Yi Chen
Ting Xiao
Xuelong Li
OffRL
139
4
0
25 May 2024
Efficient Recurrent Off-Policy RL Requires a Context-Encoder-Specific
  Learning Rate
Efficient Recurrent Off-Policy RL Requires a Context-Encoder-Specific Learning Rate
Fan Luo
Zuolin Tu
Zefang Huang
Yang Yu
OffRL
86
1
0
24 May 2024
How to Leverage Diverse Demonstrations in Offline Imitation Learning
How to Leverage Diverse Demonstrations in Offline Imitation Learning
Sheng Yue
Jiani Liu
Xingyuan Hua
Ju Ren
Sen Lin
Junshan Zhang
Yaoxue Zhang
OffRL
83
4
0
24 May 2024
Momentum-Based Federated Reinforcement Learning with Interaction and
  Communication Efficiency
Momentum-Based Federated Reinforcement Learning with Interaction and Communication Efficiency
Sheng Yue
Xingyuan Hua
Lili Chen
Ju Ren
49
1
0
24 May 2024
Diffusion Actor-Critic with Entropy Regulator
Diffusion Actor-Critic with Entropy Regulator
Yinuo Wang
Likun Wang
Yuxuan Jiang
Wenjun Zou
Tong Liu
...
Wenxuan Wang
Liming Xiao
Jiang Wu
Jingliang Duan
Shengbo Eben Li
DiffM
142
17
0
24 May 2024
Model-free reinforcement learning with noisy actions for automated experimental control in optics
Model-free reinforcement learning with noisy actions for automated experimental control in optics
Lea Richtmann
Viktoria-S. Schmiesing
Dennis Wilken
Jan Heine
Aaron Tranter
Avishek Anand
Tobias J. Osborne
M. Heurs
104
2
0
24 May 2024
MuDreamer: Learning Predictive World Models without Reconstruction
MuDreamer: Learning Predictive World Models without Reconstruction
Maxime Burchi
Radu Timofte
75
4
0
23 May 2024
Previous
123...161718...818283
Next