ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1801.01290
  4. Cited By
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement
  Learning with a Stochastic Actor
v1v2 (latest)

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

4 January 2018
Tuomas Haarnoja
Aurick Zhou
Pieter Abbeel
Sergey Levine
ArXiv (abs)PDFHTML

Papers citing "Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor"

50 / 4,128 papers shown
Title
VL-SAFE: Vision-Language Guided Safety-Aware Reinforcement Learning with World Models for Autonomous Driving
VL-SAFE: Vision-Language Guided Safety-Aware Reinforcement Learning with World Models for Autonomous Driving
Yansong Qu
Zilin Huang
Zihao Sheng
Jiancong Chen
Sikai Chen
Samuel Labi
OffRL
70
0
0
22 May 2025
How Ensembles of Distilled Policies Improve Generalisation in Reinforcement Learning
How Ensembles of Distilled Policies Improve Generalisation in Reinforcement Learning
Max Weltevrede
Moritz A. Zanger
M. Spaan
Wendelin Bohmer
OffRLFedML
93
0
0
22 May 2025
FlashBack: Consistency Model-Accelerated Shared Autonomy
FlashBack: Consistency Model-Accelerated Shared Autonomy
Luzhe Sun
Jingtian Ji
Xiangshan Tan
Matthew R. Walter
256
0
0
22 May 2025
MPO: Multilingual Safety Alignment via Reward Gap Optimization
MPO: Multilingual Safety Alignment via Reward Gap Optimization
Weixiang Zhao
Yulin Hu
Yang Deng
Tongtong Wu
Wenxuan Zhang
...
An Zhang
Yanyan Zhao
Bing Qin
Tat-Seng Chua
Ting Liu
104
2
0
22 May 2025
Sequential Monte Carlo for Policy Optimization in Continuous POMDPs
Sequential Monte Carlo for Policy Optimization in Continuous POMDPs
Hany Abdulsamad
Sahel Iqbal
Simo Särkkä
86
0
0
22 May 2025
Meta-reinforcement learning with minimum attention
Meta-reinforcement learning with minimum attention
Pilhwa Lee
Shashank Gupta
OffRL
127
0
0
22 May 2025
A Temporal Difference Method for Stochastic Continuous Dynamics
A Temporal Difference Method for Stochastic Continuous Dynamics
Haruki Settai
Naoya Takeishi
Takehisa Yairi
165
0
0
21 May 2025
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning
Shivam Agarwal
Zimin Zhang
Lifan Yuan
Jiawei Han
Hao Peng
177
8
0
21 May 2025
Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning
Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning
Yurun Yuan
Fan Chen
Zeyu Jia
Alexander Rakhlin
Tengyang Xie
OffRL
138
1
0
21 May 2025
Learning-based Autonomous Oversteer Control and Collision Avoidance
Learning-based Autonomous Oversteer Control and Collision Avoidance
Seokjun Lee
Seung-Hyun Kong
54
0
0
21 May 2025
AAPO: Enhance the Reasoning Capabilities of LLMs with Advantage Momentum
AAPO: Enhance the Reasoning Capabilities of LLMs with Advantage Momentum
Jian Xiong
Jingbo Zhou
Jingyong Ye
Dejing Dou
LRM
102
0
0
20 May 2025
Time Reversal Symmetry for Efficient Robotic Manipulations in Deep Reinforcement Learning
Time Reversal Symmetry for Efficient Robotic Manipulations in Deep Reinforcement Learning
Yunpeng Jiang
Jianshu Hu
Paul Weng
Yutong Ban
65
0
0
20 May 2025
Zero-Shot Adaptation of Behavioral Foundation Models to Unseen Dynamics
Zero-Shot Adaptation of Behavioral Foundation Models to Unseen Dynamics
Maksim Bobrin
Ilya Zisman
Alexander Nikulin
Vladislav Kurenkov
Dmitry V. Dylov
OffRL
73
0
0
19 May 2025
TD-GRPC: Temporal Difference Learning with Group Relative Policy Constraint for Humanoid Locomotion
TD-GRPC: Temporal Difference Learning with Group Relative Policy Constraint for Humanoid Locomotion
Khang Nguyen
Khai Nguyen
An T. Le
Jan Peters
Manfred Huber
Ngo Anh Vien
Minh Nhat Vu
68
0
0
19 May 2025
Multi-parameter Control for the $(1+(λ,λ))$-GA on OneMax via Deep Reinforcement Learning
Multi-parameter Control for the (1+(λ,λ))(1+(λ,λ))(1+(λ,λ))-GA on OneMax via Deep Reinforcement Learning
Tai Nguyen
Phong Le
Carola Doerr
Nguyen Dang
108
0
0
19 May 2025
Policy-Driven World Model Adaptation for Robust Offline Model-based Reinforcement Learning
Policy-Driven World Model Adaptation for Robust Offline Model-based Reinforcement Learning
Jiayu Chen
Aravind Venugopal
Jeff Schneider
OffRL
77
0
0
19 May 2025
Of Mice and Machines: A Comparison of Learning Between Real World Mice and RL Agents
Of Mice and Machines: A Comparison of Learning Between Real World Mice and RL Agents
Shuo Han
German Espinosa
Junda Huang
D. Dombeck
Malcolm A. MacIver
Bradly C. Stadie
127
0
0
18 May 2025
Multi-CALF: A Policy Combination Approach with Statistical Guarantees
Multi-CALF: A Policy Combination Approach with Statistical Guarantees
Georgiy Malaniya
Anton Bolychev
Grigory Yaremenko
Anastasia Krasnaya
Pavel Osinenko
120
0
0
18 May 2025
SAINT: Attention-Based Modeling of Sub-Action Dependencies in Multi-Action Policies
SAINT: Attention-Based Modeling of Sub-Action Dependencies in Multi-Action Policies
Matthew Landers
Taylor W. Killian
Thomas Hartvigsen
Afsaneh Doryab
66
0
0
17 May 2025
Q-Policy: Quantum-Enhanced Policy Evaluation for Scalable Reinforcement Learning
Q-Policy: Quantum-Enhanced Policy Evaluation for Scalable Reinforcement Learning
Kalyan Cherukuri
Aarav Lala
Yash Yardi
54
0
0
17 May 2025
Bench-NPIN: Benchmarking Non-prehensile Interactive Navigation
Bench-NPIN: Benchmarking Non-prehensile Interactive Navigation
Ninghan Zhong
Steven Caro
Avraiem Iskandar
Megnath Ramesh
Stephen L. Smith
60
0
0
17 May 2025
Bi-Level Policy Optimization with Nyström Hypergradients
Bi-Level Policy Optimization with Nyström Hypergradients
Arjun Prakash
Naicheng He
Denizalp Goktas
Amy Greenwald
77
0
0
16 May 2025
ReaCritic: Large Reasoning Transformer-based DRL Critic-model Scaling For Heterogeneous Networks
ReaCritic: Large Reasoning Transformer-based DRL Critic-model Scaling For Heterogeneous Networks
Feiran You
Hongyang Du
OffRLLRM
104
0
0
16 May 2025
Certifying Stability of Reinforcement Learning Policies using Generalized Lyapunov Functions
Certifying Stability of Reinforcement Learning Policies using Generalized Lyapunov Functions
Kehan Long
Jorge Cortés
Nikolay Atanasov
115
1
0
16 May 2025
Reasoning with OmniThought: A Large CoT Dataset with Verbosity and Cognitive Difficulty Annotations
Reasoning with OmniThought: A Large CoT Dataset with Verbosity and Cognitive Difficulty Annotations
Wenrui Cai
Chengyu Wang
Junbing Yan
Jun Huang
Xiangzhong Fang
LRM
67
0
0
16 May 2025
Deep Symbolic Optimization: Reinforcement Learning for Symbolic Mathematics
Deep Symbolic Optimization: Reinforcement Learning for Symbolic Mathematics
Conor F. Hayes
Felipe Leno Da Silva
Jiachen Yang
T. Nathan Mundhenk
Chak Shing Lee
...
Ahmet Can Solak
Thomas Desautels
Daniel Faissol
Brenden K. Petersen
Mikel Landajuela
75
0
0
16 May 2025
Real-Time Verification of Embodied Reasoning for Generative Skill Acquisition
Real-Time Verification of Embodied Reasoning for Generative Skill Acquisition
Bo Yue
Shuqi Guo
Kaiyu Hu
Chujiao Wang
Benyou Wang
Kui Jia
Guiliang Liu
LRM
113
0
0
16 May 2025
Zero-Shot Visual Generalization in Robot Manipulation
Zero-Shot Visual Generalization in Robot Manipulation
Sumeet Batra
Gaurav Sukhatme
79
0
0
16 May 2025
Exploration by Random Distribution Distillation
Exploration by Random Distribution Distillation
Zhirui Fang
Kai Yang
Jian Tao
Jiafei Lyu
Lusong Li
Li Shen
Xiu Li
121
1
0
16 May 2025
ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations
ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations
Jiahui Zhang
Yusen Luo
Abrar Anwar
Sumedh Anand Sontakke
Joseph J Lim
Jesse Thomason
Erdem Biyik
Jesse Zhang
OffRLLM&Ro
131
0
0
16 May 2025
Meta-World+: An Improved, Standardized, RL Benchmark
Meta-World+: An Improved, Standardized, RL Benchmark
Reginald McLean
Evangelos Chatzaroulas
Luc McCutcheon
Frank Röder
Tianhe Yu
...
Ryan Julian
Jordan Terry
Isaac Woungang
Nariman Farsad
Pablo Samuel Castro
OffRL
81
1
0
16 May 2025
Knowledge capture, adaptation and composition (KCAC): A framework for cross-task curriculum learning in robotic manipulation
Knowledge capture, adaptation and composition (KCAC): A framework for cross-task curriculum learning in robotic manipulation
Xinrui Wang
Yan Jin
86
0
0
15 May 2025
Modular Robot Control with Motor Primitives
Modular Robot Control with Motor Primitives
Moses C. Nah
Johannes Lachner
Neville Hogan
102
0
0
15 May 2025
Approximated Behavioral Metric-based State Projection for Federated Reinforcement Learning
Approximated Behavioral Metric-based State Projection for Federated Reinforcement Learning
Zengxia Guo
Bohui An
Zhongqi Lu
FedML
75
0
0
15 May 2025
Accelerating Visual-Policy Learning through Parallel Differentiable Simulation
Accelerating Visual-Policy Learning through Parallel Differentiable Simulation
Haoxiang You
Yilang Liu
Ian Abraham
79
0
0
15 May 2025
General Dynamic Goal Recognition
General Dynamic Goal Recognition
Osher Elhadad
Reuth Mirsky
AI4CE
47
1
0
14 May 2025
Preserving Plasticity in Continual Learning with Adaptive Linearity Injection
Preserving Plasticity in Continual Learning with Adaptive Linearity Injection
Seyed Roozbeh Razavi Rohani
Khashayar Khajavi
Wesley Chung
Mo Chen
Sharan Vaswani
CLLAI4CE
76
0
0
14 May 2025
Adaptive Diffusion Policy Optimization for Robotic Manipulation
Adaptive Diffusion Policy Optimization for Robotic Manipulation
Huiyun Jiang
Zhuang Yang
84
0
0
13 May 2025
Modeling Unseen Environments with Language-guided Composable Causal Components in Reinforcement Learning
Modeling Unseen Environments with Language-guided Composable Causal Components in Reinforcement Learning
Xinyue Wang
Zhen Zhang
OffRLCML
80
0
0
13 May 2025
LaDi-WM: A Latent Diffusion-based World Model for Predictive Manipulation
LaDi-WM: A Latent Diffusion-based World Model for Predictive Manipulation
Yuhang Huang
JIazhao Zhang
SHilong Zou
Xinwang Liu
Ruizhen Hu
Kai Xu
94
0
0
13 May 2025
Learning Like Humans: Advancing LLM Reasoning Capabilities via Adaptive Difficulty Curriculum Learning and Expert-Guided Self-Reformulation
Learning Like Humans: Advancing LLM Reasoning Capabilities via Adaptive Difficulty Curriculum Learning and Expert-Guided Self-Reformulation
Enci Zhang
Xingang Yan
Wei Lin
Tianxiang Zhang
Qianchun Lu
LRM
86
0
0
13 May 2025
Generalization in Monitored Markov Decision Processes (Mon-MDPs)
Generalization in Monitored Markov Decision Processes (Mon-MDPs)
Montaser Mohammedalamen
Michael Bowling
102
0
0
13 May 2025
Continuous World Coverage Path Planning for Fixed-Wing UAVs using Deep Reinforcement Learning
Continuous World Coverage Path Planning for Fixed-Wing UAVs using Deep Reinforcement Learning
Mirco Theile
Andres R. Zapata Rodriguez
Marco Caccamo
Alberto L. Sangiovanni-Vincentelli
119
0
0
13 May 2025
Combining Bayesian Inference and Reinforcement Learning for Agent Decision Making: A Review
Combining Bayesian Inference and Reinforcement Learning for Agent Decision Making: A Review
Chengmin Zhou
Ville Kyrki
Pasi Fränti
Laura Ruotsalainen
BDLAI4CE
121
0
0
12 May 2025
Imagine, Verify, Execute: Memory-Guided Agentic Exploration with Vision-Language Models
Imagine, Verify, Execute: Memory-Guided Agentic Exploration with Vision-Language Models
Seungjae Lee
Daniel Ekpo
Haowen Liu
Furong Huang
Abhinav Shrivastava
Jia-Bin Huang
LM&Ro
151
0
0
12 May 2025
Drive Fast, Learn Faster: On-Board RL for High Performance Autonomous Racing
Drive Fast, Learn Faster: On-Board RL for High Performance Autonomous Racing
Benedict Hildisch
Edoardo Ghignone
Nicolas Baumann
Cheng Hu
Andrea Carron
Michele Magno
87
0
0
12 May 2025
Cache-Efficient Posterior Sampling for Reinforcement Learning with LLM-Derived Priors Across Discrete and Continuous Domains
Cache-Efficient Posterior Sampling for Reinforcement Learning with LLM-Derived Priors Across Discrete and Continuous Domains
Ibne Farabi Shihab
Sanjeda Akter
Anuj Sharma
BDL
44
1
0
12 May 2025
A Reinforcement Learning Framework for Application-Specific TCP Congestion-Control
A Reinforcement Learning Framework for Application-Specific TCP Congestion-Control
Jinming Xing
Muhammad Shahzad
59
0
0
11 May 2025
DAPPER: Discriminability-Aware Policy-to-Policy Preference-Based Reinforcement Learning for Query-Efficient Robot Skill Acquisition
DAPPER: Discriminability-Aware Policy-to-Policy Preference-Based Reinforcement Learning for Query-Efficient Robot Skill Acquisition
Yuki Kadokawa
Jonas Frey
Takahiro Miki
Takamitsu Matsubara
Marco Hutter
82
0
0
09 May 2025
Active Perception for Tactile Sensing: A Task-Agnostic Attention-Based Approach
Active Perception for Tactile Sensing: A Task-Agnostic Attention-Based Approach
Tim Schneider
Cristiana de Farias
Roberto Calandra
Lawrence Yunliang Chen
Jan Peters
460
1
0
09 May 2025
Previous
123456...818283
Next