Unified Learning from Demonstrations, Corrections, and Preferences during Physical Human-Robot Interaction

7 July 2022

Papers citing "Unified Learning from Demonstrations, Corrections, and Preferences during Physical Human-Robot Interaction"

25 / 25 papers shown

Title
Leveraging Sub-Optimal Data for Human-in-the-Loop Reinforcement Learning Calarina Muslimani Matthew E. Taylor OffRL 103 2 0 30 Apr 2024
VIEW: Visual Imitation Learning with Waypoints Ananth Jonnavittula Sagar Parekh Dylan P. Losey SSL 143 11 0 27 Apr 2024
Inducing Structure in Reward Learning by Learning Features Andreea Bobu Marius Wiggert Claire Tomlin Anca Dragan 68 31 0 18 Jan 2022
Confidence-Aware Imitation Learning from Demonstrations with Varying Optimality Songyuan Zhang Zhangjie Cao Dorsa Sadigh Yanan Sui 44 54 0 27 Oct 2021
ThriftyDAgger: Budget-Aware Novelty and Risk Gating for Interactive Imitation Learning Ryan Hoque Ashwin Balakrishna Ellen R. Novoseller Albert Wilcox Daniel S. Brown Ken Goldberg 170 87 0 17 Sep 2021
Physical Interaction as Communication: Learning Robot Objectives Online from Human Corrections Dylan P. Losey Andrea V. Bajcsy M. O'Malley Anca Dragan 37 36 0 06 Jul 2021
PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training Kimin Lee Laura M. Smith Pieter Abbeel OffRL 63 284 0 09 Jun 2021
Planning for Safe Abortable Overtaking Maneuvers in Autonomous Driving Jiyo Palatti Andrei Aksjonov G. Alcan Ville Kyrki 56 22 0 31 Mar 2021
Corrective Shared Autonomy for Addressing Task Variability Michael Hagenow Emmanuel Senft R. Radwin Michael Gleicher Bilge Mutlu Michael Zinn 45 38 0 14 Feb 2021
Learning from Suboptimal Demonstration via Self-Supervised Reward Regression Letian Chen Rohan R. Paleja Matthew C. Gombolay SSL 68 111 0 17 Oct 2020
Learning Reward Functions from Diverse Sources of Human Feedback: Optimally Integrating Demonstrations and Preferences Erdem Biyik Dylan P. Losey Malayandi Palan Nicholas C. Landolfi Gleb Shevchuk Dorsa Sadigh 57 118 0 24 Jun 2020
Reward-rational (implicit) choice: A unifying formalism for reward learning Hong Jun Jeon S. Milli Anca Dragan 74 177 0 12 Feb 2020
Quantifying Hypothesis Space Misspecification in Learning from Human-Robot Demonstrations and Physical Corrections Andreea Bobu Andrea V. Bajcsy J. F. Fisac Sampada Deglurkar Anca Dragan 75 41 0 03 Feb 2020
Four Years in Review: Statistical Practices of Likert Scales in Human-Robot Interaction Studies Mariah L. Schrum Michael Johnson Muyleng Ghuy Matthew C. Gombolay 28 68 0 09 Jan 2020
Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations Daniel S. Brown Wonjoon Goo P. Nagarajan S. Niekum 73 357 0 12 Apr 2019
An Algorithmic Perspective on Imitation Learning Takayuki Osa Joni Pajarinen Gerhard Neumann J. Andrew Bagnell Pieter Abbeel Jan Peters 88 845 0 16 Nov 2018
Reward learning from human preferences and demonstrations in Atari Borja Ibarz Jan Leike Tobias Pohlen G. Irving Shane Legg Dario Amodei 90 394 0 15 Nov 2018
HG-DAgger: Interactive Imitation Learning with Human Experts Michael Kelly Chelsea Sidrane Katherine Driggs-Campbell Mykel J. Kochenderfer OffRL 223 229 0 05 Oct 2018
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor Tuomas Haarnoja Aurick Zhou Pieter Abbeel Sergey Levine 311 8,352 0 04 Jan 2018
Learning Robust Rewards with Adversarial Inverse Reinforcement Learning Justin Fu Katie Z Luo Sergey Levine 127 756 0 30 Oct 2017
Trajectory Deformations from Physical Human-Robot Interaction Dylan P. Losey M. O'Malley AI4CE 36 77 0 26 Oct 2017
Deep reinforcement learning from human preferences Paul Christiano Jan Leike Tom B. Brown Miljan Martic Shane Legg Dario Amodei 190 3,318 0 12 Jun 2017
Learning Preferences for Manipulation Tasks from Online Coactive Feedback Ashesh Jain Shikhar Sharma Thorsten Joachims Ashutosh Saxena 79 117 0 05 Jan 2016
Adam: A Method for Stochastic Optimization Diederik P. Kingma Jimmy Ba ODL 1.8K 150,115 0 22 Dec 2014
A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning Stéphane Ross Geoffrey J. Gordon J. Andrew Bagnell OffRL 222 3,221 0 02 Nov 2010