From "Thumbs Up" to "10 out of 10": Reconsidering Scalar Feedback in
Interactive Reinforcement Learning

From "Thumbs Up" to "10 out of 10": Reconsidering Scalar Feedback in Interactive Reinforcement Learning

17 November 2023

Reuben M. Aronson

Katherine H. Allen

ArXiv (abs)PDF HTML

Papers citing "From "Thumbs Up" to "10 out of 10": Reconsidering Scalar Feedback in Interactive Reinforcement Learning"

15 / 15 papers shown

Title
Enhancing Preference-based Linear Bandits via Human Response Time Shen Li Yuyang Zhang Tongzheng Ren Claire Liang Na Li J. Shah 144 1 0 03 Jan 2025
Self-Initiated Open World Learning for Autonomous AI Agents Bing-Quan Liu Eric Robertson Scott Grigsby Sahisnu Mazumder AI4CE 72 8 0 21 Oct 2021
A Survey on Interactive Reinforcement Learning: Design Principles and Open Challenges Christian Arzate Cruz Takeo Igarashi OffRL 46 96 0 27 May 2021
On Reward-Free Reinforcement Learning with Linear Function Approximation Ruosong Wang S. Du Lin F. Yang Ruslan Salakhutdinov OffRL 73 107 0 19 Jun 2020
Scalable agent alignment via reward modeling: a research direction Jan Leike David M. Krueger Tom Everitt Miljan Martic Vishal Maini Shane Legg 116 420 0 19 Nov 2018
Reward learning from human preferences and demonstrations in Atari Borja Ibarz Jan Leike Tobias Pohlen G. Irving Shane Legg Dario Amodei 101 397 0 15 Nov 2018
DQN-TAMER: Human-in-the-Loop Reinforcement Learning with Intractable Feedback Riku Arakawa Sosuke Kobayashi Y. Unno Yuta Tsuboi S. Maeda 51 75 0 28 Oct 2018
Deep reinforcement learning from human preferences Paul Christiano Jan Leike Tom B. Brown Miljan Martic Shane Legg Dario Amodei 218 3,377 0 12 Jun 2017
Curiosity-driven Exploration by Self-supervised Prediction Deepak Pathak Pulkit Agrawal Alexei A. Efros Trevor Darrell LRM SSL 125 2,451 0 15 May 2017
Representation Learning and Pairwise Ranking for Implicit Feedback in Recommendation Systems Sumit Sidana Mikhail Trofimov Oleh Horodnytskyi Charlotte Laclau Yury Maximov Massih-Reza Amini FedML 92 25 0 29 Apr 2017
Interactive Learning from Policy-Dependent Human Feedback J. MacGlashan Mark K. Ho R. Loftin Bei Peng Guan Wang David L. Roberts Matthew E. Taylor Michael L. Littman 87 306 0 21 Jan 2017
Variational Intrinsic Control Karol Gregor Danilo Jimenez Rezende Daan Wierstra DRL OffRL 88 430 0 22 Nov 2016
Generative Adversarial Imitation Learning Jonathan Ho Stefano Ermon GAN 159 3,125 0 10 Jun 2016
Unifying Count-Based Exploration and Intrinsic Motivation Marc G. Bellemare S. Srinivasan Georg Ostrovski Tom Schaul D. Saxton Rémi Munos 179 1,484 0 06 Jun 2016
On Wasserstein Two Sample Testing and Related Families of Nonparametric Tests Aaditya Ramdas Nicolas García Trillos Marco Cuturi 71 487 0 08 Sep 2015