Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.02757
Cited By
Heuristic-Guided Reinforcement Learning
5 June 2021
Ching-An Cheng
Andrey Kolobov
Adith Swaminathan
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Heuristic-Guided Reinforcement Learning"
35 / 35 papers shown
Title
Rapidly Adapting Policies to the Real World via Simulation-Guided Fine-Tuning
Patrick Yin
Tyler Westenbroek
Simran Bagaria
Kevin Huang
Ching-an Cheng
Andrey Kobolov
Abhishek Gupta
141
3
0
04 Feb 2025
MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking
Sebastian Farquhar
Vikrant Varma
David Lindner
David Elson
Caleb Biddulph
Ian Goodfellow
Rohin Shah
149
2
0
22 Jan 2025
Fairness in Reinforcement Learning with Bisimulation Metrics
S. Rezaei-Shoshtari
Hanna Yurchyk
Scott Fujimoto
Doina Precup
David Meger
123
0
0
03 Jan 2025
Highly Efficient Self-Adaptive Reward Shaping for Reinforcement Learning
Haozhe Ma
Zhengding Luo
Thanh Vinh Vo
Kuankuan Sima
Tze-Yun Leong
83
8
0
06 Aug 2024
Measuring Sample Efficiency and Generalization in Reinforcement Learning Benchmarks: NeurIPS 2020 Procgen Benchmark
Sharada Mohanty
Jyotish Poonganam
Adrien Gaidon
Andrey Kolobov
Blake Wulfe
...
Jacob Hilton
William H. Guss
Sahika Genc
John Schulman
K. Cobbe
51
23
0
29 Mar 2021
Regularized Behavior Value Estimation
Çağlar Gülçehre
Sergio Gomez Colmenarejo
Ziyun Wang
Jakub Sygnowski
T. Paine
Konrad Zolna
Yutian Chen
Matthew W. Hoffman
Razvan Pascanu
Nando de Freitas
OffRL
66
38
0
17 Mar 2021
Is Pessimism Provably Efficient for Offline RL?
Ying Jin
Zhuoran Yang
Zhaoran Wang
OffRL
170
358
0
30 Dec 2020
Blending MPC & Value Function Approximation for Efficient Reinforcement Learning
M. Bhardwaj
Sanjiban Choudhury
Byron Boots
43
30
0
10 Dec 2020
Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping
Yujing Hu
Weixun Wang
Hangtian Jia
Yixiang Wang
Yingfeng Chen
Jianye Hao
Feng Wu
Changjie Fan
OffRL
90
177
0
05 Nov 2020
Discount Factor as a Regularizer in Reinforcement Learning
Ron Amit
Ron Meir
K. Ciosek
OffRL
60
72
0
04 Jul 2020
AWAC: Accelerating Online Reinforcement Learning with Offline Datasets
Ashvin Nair
Abhishek Gupta
Murtaza Dalal
Sergey Levine
OffRL
OnRL
104
611
0
16 Jun 2020
Conservative Q-Learning for Offline Reinforcement Learning
Aviral Kumar
Aurick Zhou
George Tucker
Sergey Levine
OffRL
OnRL
137
1,815
0
08 Jun 2020
Understanding the Power and Limitations of Teaching with Imperfect Knowledge
R. Devidze
Farnam Mansouri
Luis Haug
Yuxin Chen
Adish Singla
97
50
0
21 Mar 2020
Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles
Dylan J. Foster
Alexander Rakhlin
365
207
0
12 Feb 2020
Reward Tweaking: Maximizing the Total Reward While Planning for Short Horizons
Chen Tessler
Shie Mannor
35
2
0
09 Feb 2020
Dota 2 with Large Scale Deep Reinforcement Learning
OpenAI OpenAI
:
Christopher Berner
Greg Brockman
Brooke Chan
...
Szymon Sidor
Ilya Sutskever
Jie Tang
Filip Wolski
Susan Zhang
GNN
VLM
CLL
AI4CE
LRM
166
1,823
0
13 Dec 2019
Leveraging Procedural Generation to Benchmark Reinforcement Learning
K. Cobbe
Christopher Hesse
Jacob Hilton
John Schulman
77
556
0
03 Dec 2019
Gamma-Nets: Generalizing Value Estimation over Timescale
Craig Sherstan
Shibhansh Dohare
J. MacGlashan
J. Günther
P. Pilarski
39
12
0
18 Nov 2019
Deep Value Model Predictive Control
Farbod Farshidian
David Hoeller
Marco Hutter
45
45
0
08 Oct 2019
Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning
H. V. Seijen
Mehdi Fatemi
Arash Tavakoli
32
34
0
03 Jun 2019
Separating value functions across time-scales
Joshua Romoff
Peter Henderson
Ahmed Touati
Emma Brunskill
Joelle Pineau
Yann Ollivier
52
25
0
05 Feb 2019
Is Q-learning Provably Efficient?
Chi Jin
Zeyuan Allen-Zhu
Sébastien Bubeck
Michael I. Jordan
OffRL
65
807
0
10 Jul 2018
Truncated Horizon Policy Search: Combining Reinforcement Learning & Imitation Learning
Wen Sun
J. Andrew Bagnell
Byron Boots
109
94
0
29 May 2018
Planning with a Receding Horizon for Manipulation in Clutter using a Learned Value Function
Wissam Bejjani
Rafael Papallas
Matteo Leonetti
M. Dogar
46
33
0
21 Mar 2018
Beyond the One Step Greedy Approach in Reinforcement Learning
Yonathan Efroni
Gal Dalal
B. Scherrer
Shie Mannor
OffRL
80
50
0
10 Feb 2018
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
L. Espeholt
Hubert Soyer
Rémi Munos
Karen Simonyan
Volodymyr Mnih
...
Vlad Firoiu
Tim Harley
Iain Dunning
Shane Legg
Koray Kavukcuoglu
215
1,600
0
05 Feb 2018
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
Tuomas Haarnoja
Aurick Zhou
Pieter Abbeel
Sergey Levine
307
8,352
0
04 Jan 2018
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
499
19,065
0
20 Jul 2017
Reverse Curriculum Generation for Reinforcement Learning
Carlos Florensa
David Held
Markus Wulfmeier
Michael Zhang
Pieter Abbeel
74
444
0
17 Jul 2017
Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction
Wen Sun
Arun Venkatraman
Geoffrey J. Gordon
Byron Boots
J. Andrew Bagnell
127
235
0
03 Mar 2017
Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning
Christoph Dann
Emma Brunskill
69
249
0
29 Oct 2015
High-Dimensional Continuous Control Using Generalized Advantage Estimation
John Schulman
Philipp Moritz
Sergey Levine
Michael I. Jordan
Pieter Abbeel
OffRL
93
3,414
0
08 Jun 2015
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
359
19,643
0
09 Mar 2015
The LAMA Planner: Guiding Cost-Based Anytime Planning with Landmarks
Silvia Richter
Matthias Westphal
65
734
0
16 Jan 2014
The FF Planning System: Fast Plan Generation Through Heuristic Search
Jörg Hoffmann
Bernhard Nebel
84
2,355
0
03 Jun 2011
1