ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.14866
  4. Cited By
Learning from an Exploring Demonstrator: Optimal Reward Estimation for
  Bandits

Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits

28 June 2021
Wenshuo Guo
Kumar Krishna Agrawal
Aditya Grover
Vidya Muthukumar
A. Pananjady
ArXivPDFHTML

Papers citing "Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits"

5 / 5 papers shown
Title
Reward-rational (implicit) choice: A unifying formalism for reward
  learning
Reward-rational (implicit) choice: A unifying formalism for reward learning
Hong Jun Jeon
S. Milli
Anca Dragan
45
177
0
12 Feb 2020
On-Policy Robot Imitation Learning from a Converging Supervisor
On-Policy Robot Imitation Learning from a Converging Supervisor
Ashwin Balakrishna
Brijen Thananjeyan
Jonathan Lee
Felix Li
Arsh Zahed
Joseph E. Gonzalez
Ken Goldberg
47
17
0
08 Jul 2019
Are sample means in multi-armed bandits positively or negatively biased?
Are sample means in multi-armed bandits positively or negatively biased?
Jaehyeok Shin
Aaditya Ramdas
Alessandro Rinaldo
33
36
0
27 May 2019
Generative Adversarial Imitation Learning
Generative Adversarial Imitation Learning
Jonathan Ho
Stefano Ermon
GAN
114
3,089
0
10 Jun 2016
On the Complexity of Best Arm Identification in Multi-Armed Bandit
  Models
On the Complexity of Best Arm Identification in Multi-Armed Bandit Models
E. Kaufmann
Olivier Cappé
Aurélien Garivier
112
1,021
0
16 Jul 2014
1