Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2002.06487
Cited By
Maxmin Q-learning: Controlling the Estimation Bias of Q-learning
16 February 2020
Qingfeng Lan
Yangchen Pan
Alona Fyshe
Martha White
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Maxmin Q-learning: Controlling the Estimation Bias of Q-learning"
50 / 99 papers shown
Title
Moderate Actor-Critic Methods: Controlling Overestimation Bias via Expectile Loss
Ukjo Hwang
Songnam Hong
OffRL
41
0
0
14 Apr 2025
A Multi-Agent Multi-Environment Mixed Q-Learning for Partially Decentralized Wireless Network Optimization
Talha Bozkus
Urbashi Mitra
50
1
0
31 Dec 2024
SAPIENT: Mastering Multi-turn Conversational Recommendation with Strategic Planning and Monte Carlo Tree Search
Hanwen Du
B. Peng
Xia Ning
38
0
0
12 Oct 2024
Kaleidoscope: Learnable Masks for Heterogeneous Multi-agent Reinforcement Learning
Xinran Li
Ling Pan
Jun Zhang
22
2
0
11 Oct 2024
MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL
C. Voelcker
Marcel Hussing
Eric Eaton
Amir-massoud Farahmand
Igor Gilitschenski
46
2
0
11 Oct 2024
Double Successive Over-Relaxation Q-Learning with an Extension to Deep Reinforcement Learning
Shreyas S R
OffRL
OnRL
36
0
0
10 Sep 2024
Improving Deep Reinforcement Learning by Reducing the Chain Effect of Value and Policy Churn
Hongyao Tang
Glen Berseth
OffRL
50
1
0
07 Sep 2024
Contextualized Hybrid Ensemble Q-learning: Learning Fast with Control Priors
Emma Cramer
Bernd Frauenknecht
Ramil Sabirov
Sebastian Trimpe
OffRL
OnRL
54
3
0
28 Jun 2024
Mixture of Experts in a Mixture of RL settings
Timon Willi
J. Obando-Ceron
Jakob Foerster
Karolina Dziugaite
Pablo Samuel Castro
MoE
54
7
0
26 Jun 2024
Highway Reinforcement Learning
Yuhui Wang
M. Strupl
Francesco Faccio
Qingyuan Wu
Haozhe Liu
Michal Grudzieñ
Xiaoyang Tan
Jürgen Schmidhuber
OffRL
42
4
0
28 May 2024
Stochastic Q-learning for Large Discrete Action Spaces
Fares Fourati
Vaneet Aggarwal
Mohamed-Slim Alouini
OffRL
44
2
0
16 May 2024
vMFER: Von Mises-Fisher Experience Resampling Based on Uncertainty of Gradient Directions for Policy Improvement
Yiwen Zhu
Jinyi Liu
Wenya Wei
Qianyi Fu
Yujing Hu
Zhou Fang
Bo An
Jianye Hao
Tangjie Lv
Changjie Fan
34
3
0
14 May 2024
Ensemble Successor Representations for Task Generalization in Offline-to-Online Reinforcement Learning
Changhong Wang
Xudong Yu
Chenjia Bai
Qiaosheng Zhang
Zhen Wang
42
1
0
12 May 2024
The Curse of Diversity in Ensemble-Based Exploration
Zhixuan Lin
P. DÓro
Evgenii Nikishin
Rameswar Panda
52
1
0
07 May 2024
CTD4 -- A Deep Continuous Distributional Actor-Critic Agent with a Kalman Fusion of Multiple Critics
David Valencia
Henry Williams
Trevor Gee
Bruce A MacDonaland
Minas V. Liarokapis
Minas Liarokapis
OffRL
40
2
0
04 May 2024
Regularized Q-learning through Robust Averaging
Peter Schmitt-Förster
Tobias Sutter
OOD
36
0
0
03 May 2024
Adaptive Regularization of Representation Rank as an Implicit Constraint of Bellman Equation
Qiang He
Dinesh Manocha
Meng Fang
S. Maghsudi
34
3
0
19 Apr 2024
Simple Ingredients for Offline Reinforcement Learning
Edoardo Cetin
Andrea Tirinzoni
Matteo Pirotta
A. Lazaric
Yann Ollivier
Ahmed Touati
OffRL
44
2
0
19 Mar 2024
Dissecting Deep RL with High Update Ratios: Combatting Value Divergence
Marcel Hussing
C. Voelcker
Igor Gilitschenski
Amir-massoud Farahmand
Eric Eaton
47
3
0
09 Mar 2024
Conservative DDPG -- Pessimistic RL without Ensemble
Nitsan Soffair
Shie Mannor
OffRL
34
0
0
08 Mar 2024
Self-evolving Autoencoder Embedded Q-Network
Ieee J. Senthilnath Senior Member
Zhen Bangjian Zhou
Wei Ng
Deeksha Aggarwal
Rajdeep Dutta
Ji Wei Yoon
Phyu Aung
Keyu Wu
Ieee Li Fellow
Xiaoli Li
64
1
0
18 Feb 2024
Leveraging Digital Cousins for Ensemble Q-Learning in Large-Scale Wireless Networks
Talha Bozkus
Urbashi Mitra
38
4
0
12 Feb 2024
Multi-Timescale Ensemble Q-learning for Markov Decision Process Policy Optimization
Talha Bozkus
Urbashi Mitra
OffRL
34
5
0
08 Feb 2024
SQT -- std
Q
Q
Q
-target
Nitsan Soffair
Dotan Di Castro
Orly Avner
Shie Mannor
OffRL
38
0
0
03 Feb 2024
SLIM: Skill Learning with Multiple Critics
David Emukpere
Bingbing Wu
Julien Perez
J. Renders
28
1
0
01 Feb 2024
REValueD: Regularised Ensemble Value-Decomposition for Factorisable Markov Decision Processes
David Ireland
Giovanni Montana
50
3
0
16 Jan 2024
SPQR: Controlling Q-ensemble Independence with Spiked Random Model for Reinforcement Learning
Dohyeok Lee
Seung Han
Taehyun Cho
Jungwoo Lee
OffRL
44
2
0
06 Jan 2024
Data-efficient Deep Reinforcement Learning for Vehicle Trajectory Control
Bernd Frauenknecht
Tobias Ehlgen
Sebastian Trimpe
44
3
0
30 Nov 2023
Stable Online and Offline Reinforcement Learning for Antibody CDRH3 Design
Yannick Vogt
Mehdi Naouar
M. Kalweit
Christoph Cornelius Miething
Justus Duyster
Roland Mertelsmann
Gabriel Kalweit
Joschka Boedecker
OffRL
OnRL
37
0
0
29 Nov 2023
Mitigating Estimation Errors by Twin TD-Regularized Actor and Critic for Deep Reinforcement Learning
Junmin Zhong
Ruofan Wu
Jennie Si
OffRL
27
1
0
07 Nov 2023
Keep Various Trajectories: Promoting Exploration of Ensemble Policies in Continuous Control
Chao Li
Chen Gong
Qiang He
Xinwen Hou
33
0
0
17 Oct 2023
Suppressing Overestimation in Q-Learning through Adversarial Behaviors
HyeAnn Lee
Donghwan Lee
23
0
0
10 Oct 2023
Elephant Neural Networks: Born to Be a Continual Learner
Qingfeng Lan
A. Rupam Mahmood
CLL
56
9
0
02 Oct 2023
Towards Robust Offline-to-Online Reinforcement Learning via Uncertainty and Smoothness
Xiaoyu Wen
Xudong Yu
Rui Yang
Chenjia Bai
Zhen Wang
OffRL
OnRL
36
10
0
29 Sep 2023
Adapting Double Q-Learning for Continuous Reinforcement Learning
Arsenii Kuznetsov
OffRL
OnRL
32
0
0
25 Sep 2023
IOB: Integrating Optimization Transfer and Behavior Transfer for Multi-Policy Reuse
Siyuan Li
Haoyang Li
Jin Zhang
Zhen Wang
Peng Liu
Chongjie Zhang
OffRL
33
1
0
14 Aug 2023
Eigensubspace of Temporal-Difference Dynamics and How It Improves Value Approximation in Reinforcement Learning
Qiang He
Dinesh Manocha
Meng Fang
S. Maghsudi
42
4
0
29 Jun 2023
Adaptive Ensemble Q-learning: Minimizing Estimation Bias via Error Feedback
Hang Wang
Sen Lin
Junshan Zhang
26
19
0
20 Jun 2023
Improving Offline-to-Online Reinforcement Learning with Q-Ensembles
Kai-Wen Zhao
Yi Ma
Jianye Hao
Jinyi Liu
Yan Zheng
Zhaopeng Meng
OffRL
OnRL
25
12
0
12 Jun 2023
Seizing Serendipity: Exploiting the Value of Past Success in Off-Policy Actor-Critic
Tianying Ji
Yuping Luo
Gang Hua
Xianyuan Zhan
Jianwei Zhang
Huazhe Xu
OffRL
OnRL
47
15
0
05 Jun 2023
Utility-Probability Duality of Neural Networks
Bojun Huang
Fei Yuan
UQCV
35
1
0
24 May 2023
MDDL: A Framework for Reinforcement Learning-based Position Allocation in Multi-Channel Feed
Xiaowen Shi
Zehua Wang
Yuanying Cai
Xiaoxu Wu
Fan Yang
Guogang Liao
Yongkang Wang
Xingxing Wang
Dong Wang
OffRL
35
1
0
17 Apr 2023
Ensemble Reinforcement Learning: A Survey
Yanjie Song
Ponnuthurai Nagaratnam Suganthan
Witold Pedrycz
Junwei Ou
Yongming He
Y. Chen
Yutong Wu
OffRL
56
38
0
05 Mar 2023
Backstepping Temporal Difference Learning
Han-Dong Lim
Dong-hwan Lee
OffRL
41
2
0
20 Feb 2023
Centralized Cooperative Exploration Policy for Continuous Control Tasks
Chong Li
Chen Gong
Qiang He
Xinwen Hou
Yu Liu
53
1
0
06 Jan 2023
Adaptive Behavior Cloning Regularization for Stable Offline-to-Online Reinforcement Learning
Yi Zhao
Rinu Boney
Alexander Ilin
Arno Solin
Joni Pajarinen
OffRL
OnRL
28
39
0
25 Oct 2022
The Pump Scheduling Problem: A Real-World Scenario for Reinforcement Learning
Henrique Donancio
L. Vercouter
H. Roclawski
AI4CE
18
1
0
20 Oct 2022
Factors of Influence of the Overestimation Bias of Q-Learning
Julius Wagenbach
M. Sabatelli
20
1
0
11 Oct 2022
Elastic Step DQN: A novel multi-step algorithm to alleviate overestimation in Deep QNetworks
Adrian Ly
Richard Dazeley
Peter Vamplew
Francisco Cruz
Sunil Aryal
18
8
0
07 Oct 2022
Reducing Variance in Temporal-Difference Value Estimation via Ensemble of Deep Networks
Litian Liang
Yaosheng Xu
Stephen Marcus McAleer
Dailin Hu
Alexander Ihler
Pieter Abbeel
Roy Fox
OOD
29
16
0
16 Sep 2022
1
2
Next