Shattering the Agent-Environment Interface for Fine-Tuning Inclusive
Language Models

Shattering the Agent-Environment Interface for Fine-Tuning Inclusive Language Models

19 May 2023

Benjamin Van Roy

Papers citing "Shattering the Agent-Environment Interface for Fine-Tuning Inclusive Language Models"

10 / 10 papers shown

Title
RLHF and IIA: Perverse Incentives Wanqiao Xu Shi Dong Xiuyuan Lu Grace Lam Zheng Wen Benjamin Van Roy 29 2 0 02 Dec 2023
A density estimation perspective on learning from pairwise human preferences Vincent Dumoulin Daniel D. Johnson Pablo Samuel Castro Hugo Larochelle Yann Dauphin 34 12 0 23 Nov 2023
The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback Nathan Lambert Roberto Calandra ALM 29 32 0 31 Oct 2023
Contrastive Preference Learning: Learning from Human Feedback without RL Joey Hejna Rafael Rafailov Harshit S. Sikchi Chelsea Finn S. Niekum W. B. Knox Dorsa Sadigh OffRL 27 50 0 20 Oct 2023
Reward Model Ensembles Help Mitigate Overoptimization Thomas Coste Usman Anwar Robert Kirk David M. Krueger NoLa ALM 28 119 0 04 Oct 2023
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback Stephen Casper Xander Davies Claudia Shi T. Gilbert Jérémy Scheurer ... Erdem Biyik Anca Dragan David M. Krueger Dorsa Sadigh Dylan Hadfield-Menell ALM OffRL 52 473 0 27 Jul 2023
Improving alignment of dialogue agents via targeted human judgements Amelia Glaese Nat McAleese Maja Trkebacz John Aslanides Vlad Firoiu ... John F. J. Mellor Demis Hassabis Koray Kavukcuoglu Lisa Anne Hendricks G. Irving ALM AAML 230 506 0 28 Sep 2022
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 354 12,003 0 04 Mar 2022
Fine-Tuning Language Models from Human Preferences Daniel M. Ziegler Nisan Stiennon Jeff Wu Tom B. Brown Alec Radford Dario Amodei Paul Christiano G. Irving ALM 301 1,610 0 18 Sep 2019
Efficient Estimation of Word Representations in Vector Space Tomáš Mikolov Kai Chen G. Corrado J. Dean 3DV 284 31,267 0 16 Jan 2013