On the Feasibility of Learning, Rather than Assuming, Human Biases for
Reward Inference

On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference

23 June 2019

Pieter Abbeel

Papers citing "On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference"

14 / 14 papers shown

Title
ARMCHAIR: integrated inverse reinforcement learning and model predictive control for human-robot collaboration Angelo Caregnato-Neto Luciano Cavalcante Siebert Arkady Zgonnikov Marcos Ricardo Omena de Albuquerque Máximo R. J. Afonso 37 2 0 29 Feb 2024
Towards Understanding Sycophancy in Language Models Mrinank Sharma Meg Tong Tomasz Korbak David Duvenaud Amanda Askell ... Oliver Rausch Nicholas Schiefer Da Yan Miranda Zhang Ethan Perez 216 198 0 20 Oct 2023
Discovering User Types: Mapping User Traits by Task-Specific Behaviors in Reinforcement Learning L. L. Ankile B. S. Ham K. Mao E. Shin S. Swaroop F. Doshi-Velez W. Pan OffRL 16 1 0 16 Jul 2023
Learning Zero-Shot Cooperation with Humans, Assuming Humans Are Biased Chao Yu Jiaxuan Gao Weiling Liu Bo Xu Hao Tang Jiaqi Yang Yu Wang Yi Wu 31 39 0 03 Feb 2023
On the Sensitivity of Reward Inference to Misspecified Human Models Joey Hong Kush S. Bhatia Anca Dragan 19 24 0 09 Dec 2022
Misspecification in Inverse Reinforcement Learning Joar Skalse Alessandro Abate 33 22 0 06 Dec 2022
Modeling Mobile Health Users as Reinforcement Learning Agents Eura Shin S. Swaroop Weiwei Pan S. Murphy Finale Doshi-Velez OffRL 15 3 0 01 Dec 2022
Law Informs Code: A Legal Informatics Approach to Aligning Artificial Intelligence with Humans John J. Nay ELM AILaw 88 27 0 14 Sep 2022
Humans are not Boltzmann Distributions: Challenges and Opportunities for Modelling Human Feedback and Interaction in Reinforcement Learning David Lindner Mennatallah El-Assady OffRL 30 16 0 27 Jun 2022
How to talk so AI will learn: Instructions, descriptions, and autonomy T. Sumers Robert D. Hawkins Mark K. Ho Thomas L. Griffiths Dylan Hadfield-Menell LM&Ro 32 20 0 16 Jun 2022
Human irrationality: both bad and good for reward inference Lawrence Chan Andrew Critch Anca Dragan 12 25 0 12 Nov 2021
Uncertain Decisions Facilitate Better Preference Learning Cassidy Laidlaw Stuart J. Russell 30 11 0 19 Jun 2021
Deep Interpretable Models of Theory of Mind Ini Oguntola Dana Hughes Katia P. Sycara HAI 33 23 0 07 Apr 2021
Conservative AI and social inequality: Conceptualizing alternatives to bias through social theory Mike Zajko 16 37 0 16 Jul 2020