ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.01325
  4. Cited By
Learning to summarize from human feedback
v1v2v3 (latest)

Learning to summarize from human feedback

2 September 2020
Nisan Stiennon
Long Ouyang
Jeff Wu
Daniel M. Ziegler
Ryan J. Lowe
Chelsea Voss
Alec Radford
Dario Amodei
Paul Christiano
    ALM
ArXiv (abs)PDFHTML

Papers citing "Learning to summarize from human feedback"

50 / 1,548 papers shown
Title
Axiomatic Preference Modeling for Longform Question Answering
Axiomatic Preference Modeling for Longform Question Answering
Corby Rosset
Guoqing Zheng
Victor C. Dibia
Ahmed Hassan Awadallah
Paul Bennett
SyDa
59
4
0
02 Dec 2023
RLHF and IIA: Perverse Incentives
RLHF and IIA: Perverse Incentives
Wanqiao Xu
Shi Dong
Xiuyuan Lu
Grace Lam
Zheng Wen
Benjamin Van Roy
86
2
0
02 Dec 2023
Nash Learning from Human Feedback
Nash Learning from Human Feedback
Rémi Munos
Michal Valko
Daniele Calandriello
M. G. Azar
Mark Rowland
...
Nikola Momchev
Olivier Bachem
D. Mankowitz
Doina Precup
Bilal Piot
148
147
0
01 Dec 2023
SeaLLMs -- Large Language Models for Southeast Asia
SeaLLMs -- Large Language Models for Southeast Asia
Xuan-Phi Nguyen
Wenxuan Zhang
Xin Li
Mahani Aljunied
Zhiqiang Hu
...
Yue Deng
Sen Yang
Chaoqun Liu
Hang Zhang
Li Bing
LRM
118
85
0
01 Dec 2023
Sample Efficient Preference Alignment in LLMs via Active Exploration
Sample Efficient Preference Alignment in LLMs via Active Exploration
Viraj Mehta
Vikramjeet Das
Ojash Neopane
Yijia Dai
Ilija Bogunovic
Ilija Bogunovic
Willie Neiswanger
Stefano Ermon
Jeff Schneider
Willie Neiswanger
OffRL
137
9
0
01 Dec 2023
AlignBench: Benchmarking Chinese Alignment of Large Language Models
AlignBench: Benchmarking Chinese Alignment of Large Language Models
Xiao Liu
Xuanyu Lei
Sheng-Ping Wang
Yue Huang
Zhuoer Feng
...
Hongning Wang
Jing Zhang
Minlie Huang
Yuxiao Dong
Jie Tang
ELMLM&MAALM
199
50
0
30 Nov 2023
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language
  Models
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models
Marwa Abdulhai
Isadora White
Charles Burton Snell
Charles Sun
Joey Hong
Yuexiang Zhai
Kelvin Xu
Sergey Levine
LLMAGOffRLLRM
87
42
0
30 Nov 2023
Unveiling the Implicit Toxicity in Large Language Models
Unveiling the Implicit Toxicity in Large Language Models
Jiaxin Wen
Pei Ke
Hao Sun
Zhexin Zhang
Chengfei Li
Jinfeng Bai
Minlie Huang
77
31
0
29 Nov 2023
MoDS: Model-oriented Data Selection for Instruction Tuning
MoDS: Model-oriented Data Selection for Instruction Tuning
Qianlong Du
Chengqing Zong
Jiajun Zhang
ALM
101
85
0
27 Nov 2023
Universal Jailbreak Backdoors from Poisoned Human Feedback
Universal Jailbreak Backdoors from Poisoned Human Feedback
Javier Rando
Florian Tramèr
121
75
0
24 Nov 2023
Reinforcement Learning from Statistical Feedback: the Journey from AB
  Testing to ANT Testing
Reinforcement Learning from Statistical Feedback: the Journey from AB Testing to ANT Testing
Feiyang Han
Yimin Wei
Zhaofeng Liu
Yanxing Qi
67
1
0
24 Nov 2023
A density estimation perspective on learning from pairwise human
  preferences
A density estimation perspective on learning from pairwise human preferences
Vincent Dumoulin
Daniel D. Johnson
Pablo Samuel Castro
Hugo Larochelle
Yann Dauphin
61
15
0
23 Nov 2023
Using Human Feedback to Fine-tune Diffusion Models without Any Reward
  Model
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model
Kai Yang
Jian Tao
Jiafei Lyu
Chunjiang Ge
Jiaxin Chen
Qimai Li
Weihan Shen
Xiaolong Zhu
Xiu Li
EGVM
142
109
0
22 Nov 2023
A Baseline Analysis of Reward Models' Ability To Accurately Analyze
  Foundation Models Under Distribution Shift
A Baseline Analysis of Reward Models' Ability To Accurately Analyze Foundation Models Under Distribution Shift
Will LeVine
Benjamin Pikus
Tony Chen
Sean Hendryx
178
10
0
21 Nov 2023
Diffusion Model Alignment Using Direct Preference Optimization
Diffusion Model Alignment Using Direct Preference Optimization
Bram Wallace
Meihua Dang
Rafael Rafailov
Linqi Zhou
Aaron Lou
Senthil Purushwalkam
Stefano Ermon
Caiming Xiong
Shafiq Joty
Nikhil Naik
EGVM
171
288
0
21 Nov 2023
NeuroPrompts: An Adaptive Framework to Optimize Prompts for
  Text-to-Image Generation
NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation
Shachar Rosenman
Vasudev Lal
Phillip Howard
DiffM
65
5
0
20 Nov 2023
Unmasking and Improving Data Credibility: A Study with Datasets for
  Training Harmless Language Models
Unmasking and Improving Data Credibility: A Study with Datasets for Training Harmless Language Models
Zhaowei Zhu
Jialu Wang
Hao Cheng
Yang Liu
99
20
0
19 Nov 2023
Case Repositories: Towards Case-Based Reasoning for AI Alignment
Case Repositories: Towards Case-Based Reasoning for AI Alignment
K. J. Kevin Feng
Quan Ze Chen
Inyoung Cheong
King Xia
Amy X. Zhang
72
10
0
18 Nov 2023
FollowEval: A Multi-Dimensional Benchmark for Assessing the
  Instruction-Following Capability of Large Language Models
FollowEval: A Multi-Dimensional Benchmark for Assessing the Instruction-Following Capability of Large Language Models
Yimin Jing
Renren Jin
Jiahao Hu
Huishi Qiu
Xiaohua Wang
Peng Wang
Deyi Xiong
LRMELM
78
3
0
16 Nov 2023
RLHFPoison: Reward Poisoning Attack for Reinforcement Learning with
  Human Feedback in Large Language Models
RLHFPoison: Reward Poisoning Attack for Reinforcement Learning with Human Feedback in Large Language Models
Jiong Wang
Junlin Wu
Muhao Chen
Yevgeniy Vorobeychik
Chaowei Xiao
AAML
104
15
0
16 Nov 2023
What if you said that differently?: How Explanation Formats Affect Human
  Feedback Efficacy and User Perception
What if you said that differently?: How Explanation Formats Affect Human Feedback Efficacy and User Perception
Chaitanya Malaviya
Subin Lee
Dan Roth
Mark Yatskar
80
2
0
16 Nov 2023
HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM
HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM
Zhilin Wang
Yi Dong
Jiaqi Zeng
Virginia Adams
Makesh Narsimhan Sreedhar
...
Olivier Delalleau
Jane Polak Scowcroft
Neel Kant
Aidan Swope
Oleksii Kuchaiev
3DV
77
77
0
16 Nov 2023
Fusion-Eval: Integrating Assistant Evaluators with LLMs
Fusion-Eval: Integrating Assistant Evaluators with LLMs
Lei Shu
Nevan Wichers
Liangchen Luo
Yun Zhu
Yinxiao Liu
Jindong Chen
Lei Meng
ELM
79
4
0
15 Nov 2023
Benchmarking Generation and Evaluation Capabilities of Large Language
  Models for Instruction Controllable Summarization
Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization
Yixin Liu
Alexander R. Fabbri
Jiawen Chen
Yilun Zhao
Simeng Han
Shafiq Joty
Pengfei Liu
Dragomir R. Radev
Chien-Sheng Wu
Arman Cohan
ELM
112
64
0
15 Nov 2023
Aligning Neural Machine Translation Models: Human Feedback in Training
  and Inference
Aligning Neural Machine Translation Models: Human Feedback in Training and Inference
Miguel Moura Ramos
Patrick Fernandes
António Farinhas
André F. T. Martins
ALM
98
18
0
15 Nov 2023
Safer-Instruct: Aligning Language Models with Automated Preference Data
Safer-Instruct: Aligning Language Models with Automated Preference Data
Taiwei Shi
Kai Chen
Jieyu Zhao
ALMSyDa
115
28
0
15 Nov 2023
Towards Evaluating AI Systems for Moral Status Using Self-Reports
Towards Evaluating AI Systems for Moral Status Using Self-Reports
Ethan Perez
Robert Long
ELM
75
12
0
14 Nov 2023
Fine-tuning Language Models for Factuality
Fine-tuning Language Models for Factuality
Katherine Tian
Eric Mitchell
Huaxiu Yao
Christopher D. Manning
Chelsea Finn
KELMHILMSyDa
92
185
0
14 Nov 2023
Predicting Text Preference Via Structured Comparative Reasoning
Predicting Text Preference Via Structured Comparative Reasoning
Jing Nathan Yan
Tianqi Liu
Justin T Chiu
Jiaming Shen
Zhen Qin
...
Charumathi Lakshmanan
Y. Kurzion
Alexander M. Rush
Jialu Liu
Michael Bendersky
LRM
98
7
0
14 Nov 2023
A Wolf in Sheep's Clothing: Generalized Nested Jailbreak Prompts can
  Fool Large Language Models Easily
A Wolf in Sheep's Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easily
Peng Ding
Jun Kuang
Dan Ma
Xuezhi Cao
Yunsen Xian
Jiajun Chen
Shujian Huang
AAML
105
122
0
14 Nov 2023
Adversarial Preference Optimization: Enhancing Your Alignment via RM-LLM
  Game
Adversarial Preference Optimization: Enhancing Your Alignment via RM-LLM Game
Pengyu Cheng
Yifan Yang
Jian Li
Yong Dai
Tianhao Hu
Peixin Cao
Nan Du
Xiaolong Li
128
30
0
14 Nov 2023
AudioChatLlama: Towards General-Purpose Speech Abilities for LLMs
AudioChatLlama: Towards General-Purpose Speech Abilities for LLMs
Yassir Fathullah
Chunyang Wu
Egor Lakomkin
Ke Li
Junteng Jia
Shangguan Yuan
Jay Mahadeokar
Ozlem Kalinli
Christian Fuegen
Michael Seltzer
LM&MAMLLMAuLLM
118
44
0
12 Nov 2023
Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small
  Scorer
Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer
Bowen Tan
Yun Zhu
Lijuan Liu
Eric P. Xing
Zhiting Hu
Jindong Chen
ALMLRM
103
7
0
12 Nov 2023
Translating Legalese: Enhancing Public Understanding of Court Opinions
  with Legal Summarizers
Translating Legalese: Enhancing Public Understanding of Court Opinions with Legal Summarizers
Elliott Ash
Aniket Kesari
Suresh Naidu
Lena Song
Dominik Stammbach
ELM
81
5
0
11 Nov 2023
Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations
Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations
Joey Hong
Sergey Levine
Anca Dragan
OffRLLLMAG
93
29
0
09 Nov 2023
A Survey on Hallucination in Large Language Models: Principles,
  Taxonomy, Challenges, and Open Questions
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions
Lei Huang
Weijiang Yu
Weitao Ma
Weihong Zhong
Zhangyin Feng
...
Qianglong Chen
Weihua Peng
Xiaocheng Feng
Bing Qin
Ting Liu
LRMHILM
145
939
0
09 Nov 2023
First Tragedy, then Parse: History Repeats Itself in the New Era of
  Large Language Models
First Tragedy, then Parse: History Repeats Itself in the New Era of Large Language Models
Naomi Saphra
Eve Fleisig
Kyunghyun Cho
Adam Lopez
LRM
82
8
0
08 Nov 2023
Identifying and Mitigating Vulnerabilities in LLM-Integrated
  Applications
Identifying and Mitigating Vulnerabilities in LLM-Integrated Applications
Fengqing Jiang
Zhangchen Xu
Luyao Niu
Wei Ping
Jinyuan Jia
Bo Li
Radha Poovendran
AAML
113
36
0
07 Nov 2023
Black-Box Prompt Optimization: Aligning Large Language Models without
  Model Training
Black-Box Prompt Optimization: Aligning Large Language Models without Model Training
Jiale Cheng
Xiao Liu
Kehan Zheng
Pei Ke
Hongning Wang
Yuxiao Dong
Jie Tang
Minlie Huang
81
88
0
07 Nov 2023
Reinforcement Learning Fine-tuning of Language Models is Biased Towards
  More Extractable Features
Reinforcement Learning Fine-tuning of Language Models is Biased Towards More Extractable Features
Diogo Cruz
Edoardo Pona
Alex Holness-Tofts
Elias Schmied
Víctor Abia Alonso
Charlie Griffin
B. Cirstea
67
0
0
07 Nov 2023
Can LLMs Follow Simple Rules?
Can LLMs Follow Simple Rules?
Norman Mu
Sarah Chen
Zifan Wang
Sizhe Chen
David Karamardian
Lulwa Aljeraisy
Basel Alomair
Dan Hendrycks
David Wagner
ALM
99
32
0
06 Nov 2023
AI-TA: Towards an Intelligent Question-Answer Teaching Assistant using
  Open-Source LLMs
AI-TA: Towards an Intelligent Question-Answer Teaching Assistant using Open-Source LLMs
Yann Hicke
Anmol Agarwal
Qianou Ma
Paul Denny
AI4Ed
88
25
0
05 Nov 2023
Conditions on Preference Relations that Guarantee the Existence of
  Optimal Policies
Conditions on Preference Relations that Guarantee the Existence of Optimal Policies
Jonathan Colaco Carr
Prakash Panangaden
Doina Precup
86
2
0
03 Nov 2023
Implicit Chain of Thought Reasoning via Knowledge Distillation
Implicit Chain of Thought Reasoning via Knowledge Distillation
Yuntian Deng
Kiran Prasad
Roland Fernandez
P. Smolensky
Vishrav Chaudhary
Stuart M. Shieber
ReLMLRM
80
52
0
02 Nov 2023
Learning Realistic Traffic Agents in Closed-loop
Learning Realistic Traffic Agents in Closed-loop
Chris Zhang
James Tu
Lunjun Zhang
Kelvin Wong
Simon Suo
R. Urtasun
121
22
0
02 Nov 2023
The Impact of Preference Agreement in Reinforcement Learning from Human
  Feedback: A Case Study in Summarization
The Impact of Preference Agreement in Reinforcement Learning from Human Feedback: A Case Study in Summarization
Sian Gooding
Hassan Mansoor
42
2
0
02 Nov 2023
Blending Reward Functions via Few Expert Demonstrations for Faithful and
  Accurate Knowledge-Grounded Dialogue Generation
Blending Reward Functions via Few Expert Demonstrations for Faithful and Accurate Knowledge-Grounded Dialogue Generation
Wanyu Du
Yangfeng Ji
66
1
0
02 Nov 2023
The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from
  Human Feedback
The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback
Nathan Lambert
Roberto Calandra
ALM
145
36
0
31 Oct 2023
Vanishing Gradients in Reinforcement Finetuning of Language Models
Vanishing Gradients in Reinforcement Finetuning of Language Models
Noam Razin
Hattie Zhou
Omid Saremi
Vimal Thilak
Arwen Bradley
Preetum Nakkiran
Josh Susskind
Etai Littwin
85
10
0
31 Oct 2023
Synthetic Imitation Edit Feedback for Factual Alignment in Clinical
  Summarization
Synthetic Imitation Edit Feedback for Factual Alignment in Clinical Summarization
Prakamya Mishra
Zonghai Yao
Shuwei Chen
Beining Wang
Rohan Mittal
Hong-ye Yu
KELMALMHILM
75
7
0
30 Oct 2023
Previous
123...202122...293031
Next