ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1702.08165
  4. Cited By
Reinforcement Learning with Deep Energy-Based Policies

Reinforcement Learning with Deep Energy-Based Policies

27 February 2017
Tuomas Haarnoja
Haoran Tang
Pieter Abbeel
Sergey Levine
ArXivPDFHTML

Papers citing "Reinforcement Learning with Deep Energy-Based Policies"

48 / 48 papers shown
Title
MARFT: Multi-Agent Reinforcement Fine-Tuning
MARFT: Multi-Agent Reinforcement Fine-Tuning
Junwei Liao
Muning Wen
Jun Wang
Weinan Zhang
OffRL
96
3
0
21 Apr 2025
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
Taiwei Shi
Yiyang Wu
Linxin Song
Dinesh Manocha
Jieyu Zhao
LRM
119
9
0
07 Apr 2025
Safe Explicable Policy Search
Safe Explicable Policy Search
Akkamahadevi Hanni
Jonathan Montaño
Yu Zhang
92
0
0
10 Mar 2025
Maximum Entropy Reinforcement Learning with Diffusion Policy
Maximum Entropy Reinforcement Learning with Diffusion Policy
Xiaoyi Dong
Jian Cheng
Xinsong Zhang
86
1
0
17 Feb 2025
Learning from Suboptimal Data in Continuous Control via Auto-Regressive Soft Q-Network
Learning from Suboptimal Data in Continuous Control via Auto-Regressive Soft Q-Network
Jijia Liu
Feng Gao
Q. Liao
Chao Yu
Yu Wang
OffRL
113
0
0
01 Feb 2025
Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization
Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization
Zishun Yu
Tengyu Xu
Di Jin
Karthik Abinav Sankararaman
Yun He
...
Eryk Helenowski
Chen Zhu
Sinong Wang
Hao Ma
Han Fang
LRM
163
9
0
29 Jan 2025
Divergence-Augmented Policy Optimization
Qing Wang
Yingru Li
Jiechao Xiong
Tong Zhang
OffRL
141
16
0
28 Jan 2025
On Reward Transferability in Adversarial Inverse Reinforcement Learning: Insights from Random Matrix Theory
On Reward Transferability in Adversarial Inverse Reinforcement Learning: Insights from Random Matrix Theory
Yangchun Zhang
Wang Zhou
Yirui Zhou
75
0
0
31 Dec 2024
Stabilizing Reinforcement Learning in Differentiable Multiphysics Simulation
Stabilizing Reinforcement Learning in Differentiable Multiphysics Simulation
Eliot Xing
Vernon Luk
Jean Oh
132
0
0
16 Dec 2024
Robust Contact-rich Manipulation through Implicit Motor Adaptation
Robust Contact-rich Manipulation through Implicit Motor Adaptation
Teng Xue
Amirreza Razmjoo
Suhan Shetty
Sylvain Calinon
136
1
0
16 Dec 2024
Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets
Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets
Zhen Liu
Tim Z. Xiao
Weiyang Liu
Yoshua Bengio
Dinghuai Zhang
153
5
0
10 Dec 2024
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Heyang Zhao
Chenlu Ye
Quanquan Gu
Tong Zhang
OffRL
198
6
0
07 Nov 2024
MiniPLM: Knowledge Distillation for Pre-Training Language Models
MiniPLM: Knowledge Distillation for Pre-Training Language Models
Yuxian Gu
Hao Zhou
Fandong Meng
Jie Zhou
Minlie Huang
128
5
0
22 Oct 2024
Reward-free World Models for Online Imitation Learning
Reward-free World Models for Online Imitation Learning
Shangzhe Li
Zhiao Huang
H. Su
OffRL
126
1
0
17 Oct 2024
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation
Hyungjoo Chae
Namyoung Kim
Kai Tzu-iunn Ong
Minju Gwak
Gwanwoo Song
Jihoon Kim
Seon Gyeom Kim
Dongha Lee
Jinyoung Yeo
LLMAG
66
20
0
17 Oct 2024
Simplifying Deep Temporal Difference Learning
Simplifying Deep Temporal Difference Learning
Matteo Gallici
Mattie Fellows
Benjamin Ellis
B. Pou
Ivan Masmitja
Jakob Foerster
Mario Martin
OffRL
104
25
0
05 Jul 2024
Residual-MPPI: Online Policy Customization for Continuous Control
Residual-MPPI: Online Policy Customization for Continuous Control
Pengcheng Wang
Chenran Li
Catherine Weaver
Kenta Kawamoto
Masayoshi Tomizuka
Chen Tang
Wei Zhan
OffRL
82
3
0
01 Jul 2024
Decision Mamba: A Multi-Grained State Space Model with Self-Evolution Regularization for Offline RL
Decision Mamba: A Multi-Grained State Space Model with Self-Evolution Regularization for Offline RL
Qi Lv
Xiang Deng
Gongwei Chen
Michael Yu Wang
Liqiang Nie
112
7
0
08 Jun 2024
Bilevel reinforcement learning via the development of hyper-gradient without lower-level convexity
Bilevel reinforcement learning via the development of hyper-gradient without lower-level convexity
Yan Yang
Bin Gao
Ya-xiang Yuan
103
2
0
30 May 2024
Learning diverse attacks on large language models for robust red-teaming and safety tuning
Learning diverse attacks on large language models for robust red-teaming and safety tuning
Seanie Lee
Minsu Kim
Lynn Cherif
David Dobre
Juho Lee
...
Kenji Kawaguchi
Gauthier Gidel
Yoshua Bengio
Nikolay Malkin
Moksh Jain
AAML
112
18
0
28 May 2024
Rule-Based Lloyd Algorithm for Multi-Robot Motion Planning and Control with Safety and Convergence Guarantees
Rule-Based Lloyd Algorithm for Multi-Robot Motion Planning and Control with Safety and Convergence Guarantees
Manuel Boldrer
Álvaro Serra-Gómez
Lorenzo Lyons
Vít Krátký
Javier Alonso-Mora
Laura Ferranti
98
4
0
30 Oct 2023
Multicopy Reinforcement Learning Agents
Multicopy Reinforcement Learning Agents
Alicia P. Wolfe
Oliver Diamond
Brigitte Goeler-Slough
Remi Feuerman
Magdalena Kisielinska
Victoria Manfredi
69
0
0
19 Sep 2023
Reinforcement Learning for Generative AI: A Survey
Reinforcement Learning for Generative AI: A Survey
Yuanjiang Cao
Quan.Z Sheng
Julian McAuley
Lina Yao
SyDa
135
13
0
28 Aug 2023
Explainability in Deep Reinforcement Learning
Explainability in Deep Reinforcement Learning
Alexandre Heuillet
Fabien Couthouis
Natalia Díaz Rodríguez
XAI
143
281
0
15 Aug 2020
Quantum enhancements for deep reinforcement learning in large spaces
Quantum enhancements for deep reinforcement learning in large spaces
Sofiene Jerbi
Lea M. Trenkwalder
Hendrik Poulsen Nautrup
Hans J. Briegel
Vedran Dunjko
71
5
0
28 Oct 2019
Discrete Sequential Prediction of Continuous Actions for Deep RL
Discrete Sequential Prediction of Continuous Actions for Deep RL
Luke Metz
Julian Ibarz
Navdeep Jaitly
James Davidson
BDL
OffRL
68
119
0
14 May 2017
Equivalence Between Policy Gradients and Soft Q-Learning
Equivalence Between Policy Gradients and Soft Q-Learning
John Schulman
Xi Chen
Pieter Abbeel
OffRL
80
345
0
21 Apr 2017
Stochastic Neural Networks for Hierarchical Reinforcement Learning
Stochastic Neural Networks for Hierarchical Reinforcement Learning
Carlos Florensa
Yan Duan
Pieter Abbeel
BDL
78
361
0
10 Apr 2017
Stein Variational Policy Gradient
Stein Variational Policy Gradient
Yang Liu
Prajit Ramachandran
Qiang Liu
Jian-wei Peng
66
139
0
07 Apr 2017
Bridging the Gap Between Value and Policy Based Reinforcement Learning
Bridging the Gap Between Value and Policy Based Reinforcement Learning
Ofir Nachum
Mohammad Norouzi
Kelvin Xu
Dale Schuurmans
148
470
0
28 Feb 2017
Loss is its own Reward: Self-Supervision for Reinforcement Learning
Loss is its own Reward: Self-Supervision for Reinforcement Learning
Evan Shelhamer
Parsa Mahmoudieh
Max Argus
Trevor Darrell
SSL
77
186
0
21 Dec 2016
Reinforcement Learning with Unsupervised Auxiliary Tasks
Reinforcement Learning with Unsupervised Auxiliary Tasks
Max Jaderberg
Volodymyr Mnih
Wojciech M. Czarnecki
Tom Schaul
Joel Z Leibo
David Silver
Koray Kavukcuoglu
SSL
86
1,228
0
16 Nov 2016
Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic
Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic
S. Gu
Timothy Lillicrap
Zoubin Ghahramani
Richard Turner
Sergey Levine
OffRL
BDL
88
345
0
07 Nov 2016
Learning to Draw Samples: With Application to Amortized MLE for
  Generative Adversarial Learning
Learning to Draw Samples: With Application to Amortized MLE for Generative Adversarial Learning
Dilin Wang
Qiang Liu
GAN
BDL
112
119
0
06 Nov 2016
Combining policy gradient and Q-learning
Combining policy gradient and Q-learning
Brendan O'Donoghue
Rémi Munos
Koray Kavukcuoglu
Volodymyr Mnih
OffRL
OnRL
66
139
0
05 Nov 2016
Learning and Transfer of Modulated Locomotor Controllers
Learning and Transfer of Modulated Locomotor Controllers
N. Heess
Greg Wayne
Yuval Tassa
Timothy Lillicrap
Martin Riedmiller
David Silver
65
207
0
17 Oct 2016
Energy-based Generative Adversarial Network
Energy-based Generative Adversarial Network
Jiaqi Zhao
Michaël Mathieu
Yann LeCun
GAN
133
1,114
0
11 Sep 2016
Stein Variational Gradient Descent: A General Purpose Bayesian Inference
  Algorithm
Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm
Qiang Liu
Dilin Wang
BDL
65
1,091
0
16 Aug 2016
Deep Directed Generative Models with Energy-Based Probability Estimation
Deep Directed Generative Models with Energy-Based Probability Estimation
Taesup Kim
Yoshua Bengio
GAN
45
136
0
10 Jun 2016
Continuous Deep Q-Learning with Model-based Acceleration
Continuous Deep Q-Learning with Model-based Acceleration
S. Gu
Timothy Lillicrap
Ilya Sutskever
Sergey Levine
86
1,012
0
02 Mar 2016
Asynchronous Methods for Deep Reinforcement Learning
Asynchronous Methods for Deep Reinforcement Learning
Volodymyr Mnih
Adria Puigdomenech Badia
M. Berk Mirza
Alex Graves
Timothy Lillicrap
Tim Harley
David Silver
Koray Kavukcuoglu
189
8,833
0
04 Feb 2016
Taming the Noise in Reinforcement Learning via Soft Updates
Taming the Noise in Reinforcement Learning via Soft Updates
Roy Fox
Ari Pakman
Naftali Tishby
67
338
0
28 Dec 2015
Continuous control with deep reinforcement learning
Continuous control with deep reinforcement learning
Timothy Lillicrap
Jonathan J. Hunt
Alexander Pritzel
N. Heess
Tom Erez
Yuval Tassa
David Silver
Daan Wierstra
306
13,214
0
09 Sep 2015
High-Dimensional Continuous Control Using Generalized Advantage
  Estimation
High-Dimensional Continuous Control Using Generalized Advantage Estimation
John Schulman
Philipp Moritz
Sergey Levine
Michael I. Jordan
Pieter Abbeel
OffRL
82
3,399
0
08 Jun 2015
End-to-End Training of Deep Visuomotor Policies
End-to-End Training of Deep Visuomotor Policies
Sergey Levine
Chelsea Finn
Trevor Darrell
Pieter Abbeel
BDL
284
3,431
0
02 Apr 2015
Trust Region Policy Optimization
Trust Region Policy Optimization
John Schulman
Sergey Levine
Philipp Moritz
Michael I. Jordan
Pieter Abbeel
274
6,755
0
19 Feb 2015
Adam: A Method for Stochastic Optimization
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
1.5K
149,842
0
22 Dec 2014
Playing Atari with Deep Reinforcement Learning
Playing Atari with Deep Reinforcement Learning
Volodymyr Mnih
Koray Kavukcuoglu
David Silver
Alex Graves
Ioannis Antonoglou
Daan Wierstra
Martin Riedmiller
114
12,201
0
19 Dec 2013
1