ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.04919
  4. Cited By
The Impact of Preference Agreement in Reinforcement Learning from Human
  Feedback: A Case Study in Summarization

The Impact of Preference Agreement in Reinforcement Learning from Human Feedback: A Case Study in Summarization

2 November 2023
Sian Gooding
Hassan Mansoor
ArXivPDFHTML

Papers citing "The Impact of Preference Agreement in Reinforcement Learning from Human Feedback: A Case Study in Summarization"

7 / 7 papers shown
Title
Training a Helpful and Harmless Assistant with Reinforcement Learning
  from Human Feedback
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Yuntao Bai
Andy Jones
Kamal Ndousse
Amanda Askell
Anna Chen
...
Jack Clark
Sam McCandlish
C. Olah
Benjamin Mann
Jared Kaplan
239
2,535
0
12 Apr 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
754
12,835
0
04 Mar 2022
Dealing with Disagreements: Looking Beyond the Majority Vote in
  Subjective Annotations
Dealing with Disagreements: Looking Beyond the Majority Vote in Subjective Annotations
Aida Mostafazadeh Davani
Mark Díaz
Vinodkumar Prabhakaran
50
315
0
12 Oct 2021
Learning to summarize from human feedback
Learning to summarize from human feedback
Nisan Stiennon
Long Ouyang
Jeff Wu
Daniel M. Ziegler
Ryan J. Lowe
Chelsea Voss
Alec Radford
Dario Amodei
Paul Christiano
ALM
209
2,124
0
02 Sep 2020
SummEval: Re-evaluating Summarization Evaluation
SummEval: Re-evaluating Summarization Evaluation
Alexander R. Fabbri
Wojciech Kry'sciñski
Bryan McCann
Caiming Xiong
R. Socher
Dragomir R. Radev
HILM
90
710
0
24 Jul 2020
Deep reinforcement learning from human preferences
Deep reinforcement learning from human preferences
Paul Christiano
Jan Leike
Tom B. Brown
Miljan Martic
Shane Legg
Dario Amodei
126
3,282
0
12 Jun 2017
Statistical modality tagging from rule-based annotations and
  crowdsourcing
Statistical modality tagging from rule-based annotations and crowdsourcing
Vinodkumar Prabhakaran
Michael Bloodgood
Mona T. Diab
Bonnie J. Dorr
Lori S. Levin
C. Piatko
Owen Rambow
Benjamin Van Durme
32
28
0
04 Mar 2015
1