ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.01325
  4. Cited By
Learning to summarize from human feedback

Learning to summarize from human feedback

2 September 2020
Nisan Stiennon
Long Ouyang
Jeff Wu
Daniel M. Ziegler
Ryan J. Lowe
Chelsea Voss
Alec Radford
Dario Amodei
Paul Christiano
    ALM
ArXivPDFHTML

Papers citing "Learning to summarize from human feedback"

50 / 1,442 papers shown
Title
SummScore: A Comprehensive Evaluation Metric for Summary Quality Based
  on Cross-Encoder
SummScore: A Comprehensive Evaluation Metric for Summary Quality Based on Cross-Encoder
Wuhang Lin
Shasha Li
Chen Zhang
Bing Ji
Jie Yu
Jun Ma
Zibo Yi
19
6
0
11 Jul 2022
Conditional Generation with a Question-Answering Blueprint
Conditional Generation with a Question-Answering Blueprint
Shashi Narayan
Joshua Maynez
Reinald Kim Amplayo
Kuzman Ganchev
Annie Louis
Fantine Huot
Anders Sandholm
Dipanjan Das
Mirella Lapata
69
47
0
01 Jul 2022
Mapping the Design Space of Human-AI Interaction in Text Summarization
Mapping the Design Space of Human-AI Interaction in Text Summarization
Ruijia Cheng
Alison Smith-Renner
Kecheng Zhang
Joel R. Tetreault
A. Jaimes
53
31
0
29 Jun 2022
Know your audience: specializing grounded language models with listener
  subtraction
Know your audience: specializing grounded language models with listener subtraction
Aaditya K. Singh
David Ding
Andrew M. Saxe
Felix Hill
Andrew Kyle Lampinen
33
2
0
16 Jun 2022
'John ate 5 apples' != 'John ate some apples': Self-Supervised
  Paraphrase Quality Detection for Algebraic Word Problems
'John ate 5 apples' != 'John ate some apples': Self-Supervised Paraphrase Quality Detection for Algebraic Word Problems
Rishabh Gupta
Venktesh V
Mukesh Mohania
Vikram Goyal
AIMat
11
2
0
16 Jun 2022
An Exploration of Post-Editing Effectiveness in Text Summarization
An Exploration of Post-Editing Effectiveness in Text Summarization
Vivian Lai
Alison Smith-Renner
Ke Zhang
Ruijia Cheng
Wenjuan Zhang
Joel R. Tetreault
Alejandro Jaimes
17
1
0
13 Jun 2022
Human-AI Interaction Design in Machine Teaching
Human-AI Interaction Design in Machine Teaching
Karan Taneja
Harsh Sikka
Ashok K. Goel
14
2
0
10 Jun 2022
Offline RL for Natural Language Generation with Implicit Language Q
  Learning
Offline RL for Natural Language Generation with Implicit Language Q Learning
Charles Burton Snell
Ilya Kostrikov
Yi Su
Mengjiao Yang
Sergey Levine
OffRL
144
103
0
05 Jun 2022
On Reinforcement Learning and Distribution Matching for Fine-Tuning
  Language Models with no Catastrophic Forgetting
On Reinforcement Learning and Distribution Matching for Fine-Tuning Language Models with no Catastrophic Forgetting
Tomasz Korbak
Hady ElSahar
Germán Kruszewski
Marc Dymetman
CLL
27
51
0
01 Jun 2022
Teaching Models to Express Their Uncertainty in Words
Teaching Models to Express Their Uncertainty in Words
Stephanie C. Lin
Jacob Hilton
Owain Evans
OOD
35
368
0
28 May 2022
Quark: Controllable Text Generation with Reinforced Unlearning
Quark: Controllable Text Generation with Reinforced Unlearning
Ximing Lu
Sean Welleck
Jack Hessel
Liwei Jiang
Lianhui Qin
Peter West
Prithviraj Ammanabrolu
Yejin Choi
MU
71
206
0
26 May 2022
Multimodal Knowledge Alignment with Reinforcement Learning
Multimodal Knowledge Alignment with Reinforcement Learning
Youngjae Yu
Jiwan Chung
Heeseung Yun
Jack Hessel
Jinho Park
...
Prithviraj Ammanabrolu
Rowan Zellers
Ronan Le Bras
Gunhee Kim
Yejin Choi
VLM
123
36
0
25 May 2022
RL with KL penalties is better viewed as Bayesian inference
RL with KL penalties is better viewed as Bayesian inference
Tomasz Korbak
Ethan Perez
Christopher L. Buckley
OffRL
38
73
0
23 May 2022
Efficient Unsupervised Sentence Compression by Fine-tuning Transformers
  with Reinforcement Learning
Efficient Unsupervised Sentence Compression by Fine-tuning Transformers with Reinforcement Learning
D. Ghalandari
Chris Hokamp
Georgiana Ifrim
18
20
0
17 May 2022
Training Language Models with Language Feedback
Training Language Models with Language Feedback
Jérémy Scheurer
Jon Ander Campos
Jun Shern Chan
Angelica Chen
Kyunghyun Cho
Ethan Perez
ALM
53
48
0
29 Apr 2022
A Framework for Interactive Knowledge-Aided Machine Teaching
A Framework for Interactive Knowledge-Aided Machine Teaching
Karan Taneja
Harsh Sikka
Ashok K. Goel
HAI
25
4
0
21 Apr 2022
A Survey on Neural Abstractive Summarization Methods and Factual
  Consistency of Summarization
A Survey on Neural Abstractive Summarization Methods and Factual Consistency of Summarization
Meng Cao
24
6
0
20 Apr 2022
Text Revision by On-the-Fly Representation Optimization
Text Revision by On-the-Fly Representation Optimization
Jingjing Li
Zichao Li
Tao Ge
Irwin King
M. Lyu
BDL
31
17
0
15 Apr 2022
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
Sid Black
Stella Biderman
Eric Hallahan
Quentin G. Anthony
Leo Gao
...
Shivanshu Purohit
Laria Reynolds
J. Tow
Benqi Wang
Samuel Weinbach
102
803
0
14 Apr 2022
Causal Confusion and Reward Misidentification in Preference-Based Reward
  Learning
Causal Confusion and Reward Misidentification in Preference-Based Reward Learning
J. Tien
Jerry Zhi-Yang He
Zackory M. Erickson
Anca Dragan
Daniel S. Brown
CML
41
40
0
13 Apr 2022
ASQA: Factoid Questions Meet Long-Form Answers
ASQA: Factoid Questions Meet Long-Form Answers
Ivan Stelmakh
Yi Luan
Bhuwan Dhingra
Ming-Wei Chang
35
163
0
12 Apr 2022
Make The Most of Prior Data: A Solution for Interactive Text
  Summarization with Preference Feedback
Make The Most of Prior Data: A Solution for Interactive Text Summarization with Preference Feedback
Duy-Hung Nguyen
Nguyen-Viet-Dung Nghiem
Bao-Sinh Nguyen
Dung Tien Le
Shahab Sabahi
Minh Le Nguyen
Hung Le
37
13
0
12 Apr 2022
Active Learning with Label Comparisons
Active Learning with Label Comparisons
G. Yona
Shay Moran
G. Elidan
Amir Globerson
25
6
0
10 Apr 2022
Using Interactive Feedback to Improve the Accuracy and Explainability of
  Question Answering Systems Post-Deployment
Using Interactive Feedback to Improve the Accuracy and Explainability of Question Answering Systems Post-Deployment
Zichao Li
Prakhar Sharma
Xing Han Lù
Jackie C.K. Cheung
Siva Reddy
HAI
25
26
0
06 Apr 2022
Teaching language models to support answers with verified quotes
Teaching language models to support answers with verified quotes
Jacob Menick
Maja Trebacz
Vladimir Mikulik
John Aslanides
Francis Song
...
Mia Glaese
Susannah Young
Lucy Campbell-Gillingham
G. Irving
Nat McAleese
ELM
RALM
251
259
0
21 Mar 2022
Simulating Bandit Learning from User Feedback for Extractive Question
  Answering
Simulating Bandit Learning from User Feedback for Extractive Question Answering
Ge Gao
Eunsol Choi
Yoav Artzi
38
10
0
18 Mar 2022
SURF: Semi-supervised Reward Learning with Data Augmentation for
  Feedback-efficient Preference-based Reinforcement Learning
SURF: Semi-supervised Reward Learning with Data Augmentation for Feedback-efficient Preference-based Reinforcement Learning
Jongjin Park
Younggyo Seo
Jinwoo Shin
Honglak Lee
Pieter Abbeel
Kimin Lee
17
82
0
18 Mar 2022
Invariance in Policy Optimisation and Partial Identifiability in Reward
  Learning
Invariance in Policy Optimisation and Partial Identifiability in Reward Learning
Joar Skalse
Matthew Farrugia-Roberts
Stuart J. Russell
Alessandro Abate
Adam Gleave
9
46
0
14 Mar 2022
Uncertainty Estimation for Language Reward Models
Uncertainty Estimation for Language Reward Models
Adam Gleave
G. Irving
UQLM
42
32
0
14 Mar 2022
Active Evaluation: Efficient NLG Evaluation with Few Pairwise
  Comparisons
Active Evaluation: Efficient NLG Evaluation with Few Pairwise Comparisons
Akash Kumar Mohankumar
Mitesh M. Khapra
ELM
AAML
21
7
0
11 Mar 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
399
12,150
0
04 Mar 2022
Capturing Failures of Large Language Models via Human Cognitive Biases
Capturing Failures of Large Language Models via Human Cognitive Biases
Erik Jones
Jacob Steinhardt
36
92
0
24 Feb 2022
CAISE: Conversational Agent for Image Search and Editing
CAISE: Conversational Agent for Image Search and Editing
Hyounghun Kim
Doo Soon Kim
Seunghyun Yoon
Franck Dernoncourt
Trung Bui
Joey Tianyi Zhou
32
6
0
24 Feb 2022
Reward Modeling for Mitigating Toxicity in Transformer-based Language
  Models
Reward Modeling for Mitigating Toxicity in Transformer-based Language Models
Farshid Faal
K. Schmitt
Jia Yuan Yu
13
24
0
19 Feb 2022
A data-driven approach for learning to control computers
A data-driven approach for learning to control computers
Peter C. Humphreys
David Raposo
Tobias Pohlen
Gregory Thornton
Rachita Chhaparia
...
Josh Abramson
Petko Georgiev
Alex Goldin
Adam Santoro
Timothy Lillicrap
33
97
0
16 Feb 2022
Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation
  Practices for Generated Text
Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text
Sebastian Gehrmann
Elizabeth Clark
Thibault Sellam
ELM
AI4CE
71
184
0
14 Feb 2022
Red Teaming Language Models with Language Models
Red Teaming Language Models with Language Models
Ethan Perez
Saffron Huang
Francis Song
Trevor Cai
Roman Ring
John Aslanides
Amelia Glaese
Nat McAleese
G. Irving
AAML
13
613
0
07 Feb 2022
Safe Deep RL in 3D Environments using Human Feedback
Safe Deep RL in 3D Environments using Human Feedback
Matthew Rahtz
Vikrant Varma
Ramana Kumar
Zachary Kenton
Shane Legg
Jan Leike
32
4
0
20 Jan 2022
A Survey of Controllable Text Generation using Transformer-based
  Pre-trained Language Models
A Survey of Controllable Text Generation using Transformer-based Pre-trained Language Models
Hanqing Zhang
Haolin Song
Shaoyu Li
Ming Zhou
Dawei Song
57
215
0
14 Jan 2022
The Effects of Reward Misspecification: Mapping and Mitigating
  Misaligned Models
The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models
Alexander Pan
Kush S. Bhatia
Jacob Steinhardt
53
172
0
10 Jan 2022
Beyond modeling: NLP Pipeline for efficient environmental policy
  analysis
Beyond modeling: NLP Pipeline for efficient environmental policy analysis
J. Planas
Daniel Firebanks-Quevedo
G. Naydenova
Ramansh Sharma
Cristina Taylor
Kathleen Buckingham
Rong Fang
20
4
0
08 Jan 2022
Integrating Human-in-the-loop into Swarm Learning for Decentralized Fake
  News Detection
Integrating Human-in-the-loop into Swarm Learning for Decentralized Fake News Detection
Xishuang Dong
Lijun Qian
33
9
0
04 Jan 2022
WebGPT: Browser-assisted question-answering with human feedback
WebGPT: Browser-assisted question-answering with human feedback
Reiichiro Nakano
Jacob Hilton
S. Balaji
Jeff Wu
Ouyang Long
...
Gretchen Krueger
Kevin Button
Matthew Knight
B. Chess
John Schulman
ALM
RALM
126
1,210
0
17 Dec 2021
Reframing Human-AI Collaboration for Generating Free-Text Explanations
Reframing Human-AI Collaboration for Generating Free-Text Explanations
Sarah Wiegreffe
Jack Hessel
Swabha Swayamdipta
Mark O. Riedl
Yejin Choi
28
143
0
16 Dec 2021
A General Language Assistant as a Laboratory for Alignment
A General Language Assistant as a Laboratory for Alignment
Amanda Askell
Yuntao Bai
Anna Chen
Dawn Drain
Deep Ganguli
...
Tom B. Brown
Jack Clark
Sam McCandlish
C. Olah
Jared Kaplan
ALM
28
730
0
01 Dec 2021
Expressive Communication: A Common Framework for Evaluating Developments
  in Generative Models and Steering Interfaces
Expressive Communication: A Common Framework for Evaluating Developments in Generative Models and Steering Interfaces
Ryan Louie
Jesse Engel
Cheng-Zhi Anna Huang
35
9
0
29 Nov 2021
Robust Deep Reinforcement Learning for Extractive Legal Summarization
Robust Deep Reinforcement Learning for Extractive Legal Summarization
Duy-Hung Nguyen
Bao-Sinh Nguyen
Nguyen-Viet-Dung Nghiem
Dung Tien Le
Mim Amina Khatun
Minh Le Nguyen
Hung Le
ELM
AILaw
AI4TS
83
18
0
13 Nov 2021
B-Pref: Benchmarking Preference-Based Reinforcement Learning
B-Pref: Benchmarking Preference-Based Reinforcement Learning
Kimin Lee
Laura M. Smith
Anca Dragan
Pieter Abbeel
OffRL
45
93
0
04 Nov 2021
Deep Transfer Learning & Beyond: Transformer Language Models in
  Information Systems Research
Deep Transfer Learning & Beyond: Transformer Language Models in Information Systems Research
Ross Gruetzemacher
D. Paradice
30
30
0
18 Oct 2021
The Dangers of Underclaiming: Reasons for Caution When Reporting How NLP
  Systems Fail
The Dangers of Underclaiming: Reasons for Caution When Reporting How NLP Systems Fail
Sam Bowman
OffRL
24
45
0
15 Oct 2021
Previous
123...272829
Next