ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1709.06560
  4. Cited By
Deep Reinforcement Learning that Matters

Deep Reinforcement Learning that Matters

19 September 2017
Peter Henderson
Riashat Islam
Philip Bachman
Joelle Pineau
Doina Precup
David Meger
    OffRL
ArXivPDFHTML

Papers citing "Deep Reinforcement Learning that Matters"

50 / 372 papers shown
Title
Multi-parameter Control for the (1+($λ$,$λ$))-GA on OneMax via Deep Reinforcement Learning
Multi-parameter Control for the (1+(λλλ,λλλ))-GA on OneMax via Deep Reinforcement Learning
Tai Nguyen
Phong Le
Carola Doerr
Nguyen Dang
24
0
0
19 May 2025
Parameter Estimation using Reinforcement Learning Causal Curiosity: Limits and Challenges
Parameter Estimation using Reinforcement Learning Causal Curiosity: Limits and Challenges
Miguel Arana-Catania
Weisi Guo
CML
35
0
0
13 May 2025
Adaptive Stress Testing Black-Box LLM Planners
Adaptive Stress Testing Black-Box LLM Planners
Neeloy Chakraborty
John Pohovey
Melkior Ornik
Katherine Driggs-Campbell
34
0
0
08 May 2025
Automated Hybrid Reward Scheduling via Large Language Models for Robotic Skill Learning
Automated Hybrid Reward Scheduling via Large Language Models for Robotic Skill Learning
Changxin Huang
Junyang Liang
Yanbin Chang
Jingzhao Xu
Jianqiang Li
34
0
0
05 May 2025
Leveraging Partial SMILES Validation Scheme for Enhanced Drug Design in Reinforcement Learning Frameworks
Leveraging Partial SMILES Validation Scheme for Enhanced Drug Design in Reinforcement Learning Frameworks
Xinyu Wang
Jinbo Bi
Minghu Song
CLL
69
0
0
01 May 2025
KETCHUP: K-Step Return Estimation for Sequential Knowledge Distillation
KETCHUP: K-Step Return Estimation for Sequential Knowledge Distillation
Jiabin Fan
Guoqing Luo
Michael Bowling
Lili Mou
OffRL
68
0
0
26 Apr 2025
Dynamic Action Interpolation: A Universal Approach for Accelerating Reinforcement Learning with Expert Guidance
Dynamic Action Interpolation: A Universal Approach for Accelerating Reinforcement Learning with Expert Guidance
Wenjun Cao
52
0
0
26 Apr 2025
CaRL: Learning Scalable Planning Policies with Simple Rewards
CaRL: Learning Scalable Planning Policies with Simple Rewards
Bernhard Jaeger
D. Dauner
Jens Beißwenger
Simon Gerstenecker
Kashyap Chitta
Andreas Geiger
60
1
0
24 Apr 2025
Autonomous Control of Redundant Hydraulic Manipulator Using Reinforcement Learning with Action Feedback
Autonomous Control of Redundant Hydraulic Manipulator Using Reinforcement Learning with Action Feedback
Rohit Dhakate
Christian Brommer
C. Böhm
Stephan Weiss
J. Steinbrener
36
5
0
22 Apr 2025
AlphaGrad: Non-Linear Gradient Normalization Optimizer
AlphaGrad: Non-Linear Gradient Normalization Optimizer
Soham Sane
ODL
56
0
0
22 Apr 2025
MindGYM: What Matters in Question Synthesis for Thinking-Centric Fine-Tuning?
MindGYM: What Matters in Question Synthesis for Thinking-Centric Fine-Tuning?
Zhe Xu
Daoyuan Chen
Zhenqing Ling
Yaliang Li
Ying Shen
LRM
ReLM
SyDa
62
0
0
12 Mar 2025
Hyperspherical Normalization for Scalable Deep Reinforcement Learning
Hyperspherical Normalization for Scalable Deep Reinforcement Learning
Hojoon Lee
Youngdo Lee
Takuma Seno
Donghu Kim
Peter Stone
Jaegul Choo
70
1
0
24 Feb 2025
Discovering highly efficient low-weight quantum error-correcting codes with reinforcement learning
Discovering highly efficient low-weight quantum error-correcting codes with reinforcement learning
Austin Yubo He
Zi-Wen Liu
97
3
0
21 Feb 2025
Reinforcement Learning for Dynamic Resource Allocation in Optical Networks: Hype or Hope?
Reinforcement Learning for Dynamic Resource Allocation in Optical Networks: Hype or Hope?
Michael Doherty
Robin Matzner
Rasoul Sadeghi
Polina Bayvel
Alejandra Beghelli
65
0
0
18 Feb 2025
Increasing Information for Model Predictive Control with Semi-Markov Decision Processes
Increasing Information for Model Predictive Control with Semi-Markov Decision Processes
Rémy Hosseinkhan Boucher
Onofrio Semeraro
L. Mathelin
53
0
0
28 Jan 2025
Evidence on the Regularisation Properties of Maximum-Entropy Reinforcement Learning
Evidence on the Regularisation Properties of Maximum-Entropy Reinforcement Learning
Rémy Hosseinkhan Boucher
Onofrio Semeraro
L. Mathelin
82
0
0
28 Jan 2025
Optimizing Automatic Differentiation with Deep Reinforcement Learning
Optimizing Automatic Differentiation with Deep Reinforcement Learning
Jamie Lohoff
Emre Neftci
61
1
0
28 Jan 2025
Revisiting Ensemble Methods for Stock Trading and Crypto Trading Tasks at ACM ICAIF FinRL Contest 2023-2024
Revisiting Ensemble Methods for Stock Trading and Crypto Trading Tasks at ACM ICAIF FinRL Contest 2023-2024
Nikolaus Holzer
Keyi Wang
Kairong Xiao
Xiao-Yang Liu Yanglet
AIFin
35
1
0
18 Jan 2025
EVaDE : Event-Based Variational Thompson Sampling for Model-Based Reinforcement Learning
EVaDE : Event-Based Variational Thompson Sampling for Model-Based Reinforcement Learning
Siddharth Aravindan
Dixant Mittal
Wee Sun Lee
BDL
79
0
0
17 Jan 2025
Acceleration for Deep Reinforcement Learning using Parallel and
  Distributed Computing: A Survey
Acceleration for Deep Reinforcement Learning using Parallel and Distributed Computing: A Survey
Zhihong Liu
Xin Xu
Peng Qiao
Dongsheng Li
OffRL
31
2
0
08 Nov 2024
GraphXForm: Graph transformer for computer-aided molecular design
GraphXForm: Graph transformer for computer-aided molecular design
Jonathan Pirnay
Jan G. Rittig
Alexander B. Wolf
Martin Grohe
Jakob Burger
Alexander Mitsos
D. G. Grimm
AI4CE
60
1
0
03 Nov 2024
Offline Reinforcement Learning for Learning to Dispatch for Job Shop Scheduling
Offline Reinforcement Learning for Learning to Dispatch for Job Shop Scheduling
Jesse van Remmerden
Z. Bukhsh
Yingqian Zhang
OffRL
OnRL
47
1
0
16 Sep 2024
Compatible Gradient Approximations for Actor-Critic Algorithms
Compatible Gradient Approximations for Actor-Critic Algorithms
Baturay Saglam
Dionysis Kalogerias
37
0
0
02 Sep 2024
A Comparative Study of Deep Reinforcement Learning Models: DQN vs PPO vs
  A2C
A Comparative Study of Deep Reinforcement Learning Models: DQN vs PPO vs A2C
Neil De La Fuente
Daniel A. Vidal Guerra
OffRL
24
5
0
19 Jul 2024
Variational Best-of-N Alignment
Variational Best-of-N Alignment
Afra Amini
Tim Vieira
Ryan Cotterell
Ryan Cotterell
BDL
43
19
0
08 Jul 2024
AI Agents That Matter
AI Agents That Matter
Sayash Kapoor
Benedikt Stroebl
Zachary S. Siegel
Nitya Nadgir
Arvind Narayanan
62
37
0
01 Jul 2024
Decoupling regularization from the action space
Decoupling regularization from the action space
Sobhan Mohammadpour
Emma Frejinger
Pierre-Luc Bacon
37
0
0
10 Jun 2024
Long-Term Fairness Inquiries and Pursuits in Machine Learning: A Survey of Notions, Methods, and Challenges
Long-Term Fairness Inquiries and Pursuits in Machine Learning: A Survey of Notions, Methods, and Challenges
Usman Gohar
Zeyu Tang
Jialu Wang
Kun Zhang
Peter Spirtes
Yang Liu
Lu Cheng
FaML
68
3
0
10 Jun 2024
Multi-objective Cross-task Learning via Goal-conditioned GPT-based
  Decision Transformers for Surgical Robot Task Automation
Multi-objective Cross-task Learning via Goal-conditioned GPT-based Decision Transformers for Surgical Robot Task Automation
Jiawei Fu
Yonghao Long
Kai-xiang Chen
Wang Wei
Qi Dou
MedIm
39
4
0
29 May 2024
Diffusion-Reward Adversarial Imitation Learning
Diffusion-Reward Adversarial Imitation Learning
Chun-Mao Lai
Hsiang-Chun Wang
Ping-Chun Hsieh
Yu-Chiang Frank Wang
Min-Hung Chen
Shao-Hua Sun
41
8
0
25 May 2024
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
Christopher Rawles
Sarah Clinckemaillie
Yifan Chang
Jonathan Waltz
Gabrielle Lau
...
Daniel Toyama
Robert Berry
Divya Tyamagundlu
Timothy Lillicrap
Oriana Riva
LLMAG
72
44
0
23 May 2024
A Meta-Game Evaluation Framework for Deep Multiagent Reinforcement
  Learning
A Meta-Game Evaluation Framework for Deep Multiagent Reinforcement Learning
Zun Li
Michael P. Wellman
42
1
0
30 Apr 2024
Hyperparameter Optimization Can Even be Harmful in Off-Policy Learning
  and How to Deal with It
Hyperparameter Optimization Can Even be Harmful in Off-Policy Learning and How to Deal with It
Yuta Saito
Masahiro Nomura
OffRL
50
2
0
23 Apr 2024
Generalized Population-Based Training for Hyperparameter Optimization in
  Reinforcement Learning
Generalized Population-Based Training for Hyperparameter Optimization in Reinforcement Learning
Hui Bai
Ran Cheng
55
4
0
12 Apr 2024
Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art
Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art
Neeloy Chakraborty
Melkior Ornik
Katherine Driggs-Campbell
LRM
57
9
0
25 Mar 2024
Task-optimal data-driven surrogate models for eNMPC via differentiable simulation and optimization
Task-optimal data-driven surrogate models for eNMPC via differentiable simulation and optimization
Daniel Mayfrank
Na Young Ahn
Alexander Mitsos
Manuel Dahmen
34
2
0
21 Mar 2024
Offline Goal-Conditioned Reinforcement Learning for Shape Control of
  Deformable Linear Objects
Offline Goal-Conditioned Reinforcement Learning for Shape Control of Deformable Linear Objects
Rita Laezza
Mohammadreza Shetab-Bushehri
Gabriel Arslan Waltersson
Erol Özgür
Y. Mezouar
Y. Karayiannidis
OffRL
46
0
0
15 Mar 2024
Better than classical? The subtle art of benchmarking quantum machine
  learning models
Better than classical? The subtle art of benchmarking quantum machine learning models
Joseph Bowles
Shahnawaz Ahmed
Maria Schuld
47
67
0
11 Mar 2024
Supervised machine learning for microbiomics: bridging the gap between
  current and best practices
Supervised machine learning for microbiomics: bridging the gap between current and best practices
Natasha K. Dudek
Mariam Chakhvadze
Saba Kobakhidze
Omar Kantidze
Yuriy Gankin
LM&MA
42
2
0
27 Feb 2024
Hierarchical Transformers are Efficient Meta-Reinforcement Learners
Hierarchical Transformers are Efficient Meta-Reinforcement Learners
Gresa Shala
André Biedenkapp
Josif Grabocka
OffRL
40
4
0
09 Feb 2024
Extrinsicaly Rewarded Soft Q Imitation Learning with Discriminator
Extrinsicaly Rewarded Soft Q Imitation Learning with Discriminator
Ryoma Furuyama
Daiki Kuyoshi
Satoshi Yamane
23
0
0
30 Jan 2024
XLand-MiniGrid: Scalable Meta-Reinforcement Learning Environments in JAX
XLand-MiniGrid: Scalable Meta-Reinforcement Learning Environments in JAX
Alexander Nikulin
Vladislav Kurenkov
Ilya Zisman
Artem Agarkov
Viacheslav Sinii
Sergey Kolesnikov
35
26
0
19 Dec 2023
An Invitation to Deep Reinforcement Learning
An Invitation to Deep Reinforcement Learning
Bernhard Jaeger
Andreas Geiger
OffRL
OOD
80
5
0
13 Dec 2023
Automated interpretation of congenital heart disease from multi-view
  echocardiograms
Automated interpretation of congenital heart disease from multi-view echocardiograms
Jing Wang
Xiaofeng Liu
Fangyun Wang
Lin Zheng
F. Gao
Hanwen Zhang
Xin Zhang
Wanqing Xie
Bin-bin Wang
37
55
0
30 Nov 2023
Offline Skill Generalization via Task and Motion Planning
Offline Skill Generalization via Task and Motion Planning
Shin Watanabe
Geir Horn
J. Tørresen
K. Ellefsen
OffRL
30
0
0
24 Nov 2023
Designing Long-term Group Fair Policies in Dynamical Systems
Designing Long-term Group Fair Policies in Dynamical Systems
Miriam Rateike
Isabel Valera
Patrick Forré
38
4
0
21 Nov 2023
Efficiently Escaping Saddle Points for Non-Convex Policy Optimization
Efficiently Escaping Saddle Points for Non-Convex Policy Optimization
Sadegh Khorasani
Saber Salehkaleybar
Negar Kiyavash
Niao He
Matthias Grossglauser
29
1
0
15 Nov 2023
Optimal Guarantees for Algorithmic Reproducibility and Gradient
  Complexity in Convex Optimization
Optimal Guarantees for Algorithmic Reproducibility and Gradient Complexity in Convex Optimization
Liang Zhang
Junchi Yang
Amin Karbasi
Niao He
34
2
0
26 Oct 2023
TD-MPC2: Scalable, Robust World Models for Continuous Control
TD-MPC2: Scalable, Robust World Models for Continuous Control
Nicklas Hansen
Hao Su
Xiaolong Wang
MU
32
128
0
25 Oct 2023
Quantifying Language Models' Sensitivity to Spurious Features in Prompt
  Design or: How I learned to start worrying about prompt formatting
Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting
Melanie Sclar
Yejin Choi
Yulia Tsvetkov
Alane Suhr
53
308
0
17 Oct 2023
12345678
Next