Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits

28 June 2021

Papers citing "Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits"

5 / 5 papers shown

Title
Reward-rational (implicit) choice: A unifying formalism for reward learning Hong Jun Jeon S. Milli Anca Dragan 45 177 0 12 Feb 2020
On-Policy Robot Imitation Learning from a Converging Supervisor Ashwin Balakrishna Brijen Thananjeyan Jonathan Lee Felix Li Arsh Zahed Joseph E. Gonzalez Ken Goldberg 47 17 0 08 Jul 2019
Are sample means in multi-armed bandits positively or negatively biased? Jaehyeok Shin Aaditya Ramdas Alessandro Rinaldo 33 36 0 27 May 2019
Generative Adversarial Imitation Learning Jonathan Ho Stefano Ermon GAN 114 3,089 0 10 Jun 2016
On the Complexity of Best Arm Identification in Multi-Armed Bandit Models E. Kaufmann Olivier Cappé Aurélien Garivier 112 1,021 0 16 Jul 2014