ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.01325
  4. Cited By
Learning to summarize from human feedback

Learning to summarize from human feedback

2 September 2020
Nisan Stiennon
Long Ouyang
Jeff Wu
Daniel M. Ziegler
Ryan J. Lowe
Chelsea Voss
Alec Radford
Dario Amodei
Paul Christiano
    ALM
ArXivPDFHTML

Papers citing "Learning to summarize from human feedback"

50 / 1,443 papers shown
Title
Universal Jailbreak Backdoors from Poisoned Human Feedback
Universal Jailbreak Backdoors from Poisoned Human Feedback
Javier Rando
Florian Tramèr
28
63
0
24 Nov 2023
Reinforcement Learning from Statistical Feedback: the Journey from AB
  Testing to ANT Testing
Reinforcement Learning from Statistical Feedback: the Journey from AB Testing to ANT Testing
Feiyang Han
Yimin Wei
Zhaofeng Liu
Yanxing Qi
43
1
0
24 Nov 2023
A density estimation perspective on learning from pairwise human
  preferences
A density estimation perspective on learning from pairwise human preferences
Vincent Dumoulin
Daniel D. Johnson
Pablo Samuel Castro
Hugo Larochelle
Yann Dauphin
39
12
0
23 Nov 2023
Using Human Feedback to Fine-tune Diffusion Models without Any Reward
  Model
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model
Kai Yang
Jian Tao
Jiafei Lyu
Chunjiang Ge
Jiaxin Chen
Qimai Li
Weihan Shen
Xiaolong Zhu
Xiu Li
EGVM
28
90
0
22 Nov 2023
A Baseline Analysis of Reward Models' Ability To Accurately Analyze
  Foundation Models Under Distribution Shift
A Baseline Analysis of Reward Models' Ability To Accurately Analyze Foundation Models Under Distribution Shift
Will LeVine
Benjamin Pikus
Tony Chen
Sean Hendryx
51
8
0
21 Nov 2023
Diffusion Model Alignment Using Direct Preference Optimization
Diffusion Model Alignment Using Direct Preference Optimization
Bram Wallace
Meihua Dang
Rafael Rafailov
Linqi Zhou
Aaron Lou
Senthil Purushwalkam
Stefano Ermon
Caiming Xiong
Chenyu You
Nikhil Naik
EGVM
50
229
0
21 Nov 2023
NeuroPrompts: An Adaptive Framework to Optimize Prompts for
  Text-to-Image Generation
NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation
Shachar Rosenman
Vasudev Lal
Phillip Howard
DiffM
34
5
0
20 Nov 2023
Unmasking and Improving Data Credibility: A Study with Datasets for
  Training Harmless Language Models
Unmasking and Improving Data Credibility: A Study with Datasets for Training Harmless Language Models
Zhaowei Zhu
Jialu Wang
Hao Cheng
Yang Liu
38
16
0
19 Nov 2023
Case Repositories: Towards Case-Based Reasoning for AI Alignment
Case Repositories: Towards Case-Based Reasoning for AI Alignment
K. J. Kevin Feng
Quan Ze Chen
Inyoung Cheong
King Xia
Amy X. Zhang
35
10
0
18 Nov 2023
FollowEval: A Multi-Dimensional Benchmark for Assessing the
  Instruction-Following Capability of Large Language Models
FollowEval: A Multi-Dimensional Benchmark for Assessing the Instruction-Following Capability of Large Language Models
Yimin Jing
Renren Jin
Jiahao Hu
Huishi Qiu
Xiaohua Wang
Peng Wang
Deyi Xiong
LRM
ELM
30
1
0
16 Nov 2023
RLHFPoison: Reward Poisoning Attack for Reinforcement Learning with
  Human Feedback in Large Language Models
RLHFPoison: Reward Poisoning Attack for Reinforcement Learning with Human Feedback in Large Language Models
Jiong Wang
Junlin Wu
Muhao Chen
Yevgeniy Vorobeychik
Chaowei Xiao
AAML
29
13
0
16 Nov 2023
What if you said that differently?: How Explanation Formats Affect Human
  Feedback Efficacy and User Perception
What if you said that differently?: How Explanation Formats Affect Human Feedback Efficacy and User Perception
Chaitanya Malaviya
Subin Lee
Dan Roth
Mark Yatskar
42
1
0
16 Nov 2023
HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM
HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM
Zhilin Wang
Yi Dong
Jiaqi Zeng
Virginia Adams
Makesh Narsimhan Sreedhar
...
Olivier Delalleau
Jane Polak Scowcroft
Neel Kant
Aidan Swope
Oleksii Kuchaiev
3DV
22
67
0
16 Nov 2023
Fusion-Eval: Integrating Assistant Evaluators with LLMs
Fusion-Eval: Integrating Assistant Evaluators with LLMs
Lei Shu
Nevan Wichers
Liangchen Luo
Yun Zhu
Yinxiao Liu
Jindong Chen
Lei Meng
ELM
15
3
0
15 Nov 2023
Benchmarking Generation and Evaluation Capabilities of Large Language
  Models for Instruction Controllable Summarization
Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization
Yixin Liu
Alexander R. Fabbri
Jiawen Chen
Yilun Zhao
Simeng Han
Chenyu You
Pengfei Liu
Dragomir R. Radev
Chien-Sheng Wu
Arman Cohan
ELM
48
59
0
15 Nov 2023
Aligning Neural Machine Translation Models: Human Feedback in Training
  and Inference
Aligning Neural Machine Translation Models: Human Feedback in Training and Inference
Miguel Moura Ramos
Patrick Fernandes
António Farinhas
André F. T. Martins
ALM
30
15
0
15 Nov 2023
Safer-Instruct: Aligning Language Models with Automated Preference Data
Safer-Instruct: Aligning Language Models with Automated Preference Data
Taiwei Shi
Kai Chen
Jieyu Zhao
ALM
SyDa
40
22
0
15 Nov 2023
Towards Evaluating AI Systems for Moral Status Using Self-Reports
Towards Evaluating AI Systems for Moral Status Using Self-Reports
Ethan Perez
Robert Long
ELM
38
8
0
14 Nov 2023
Fine-tuning Language Models for Factuality
Fine-tuning Language Models for Factuality
Katherine Tian
Eric Mitchell
Huaxiu Yao
Christopher D. Manning
Chelsea Finn
KELM
HILM
SyDa
19
167
0
14 Nov 2023
Predicting Text Preference Via Structured Comparative Reasoning
Predicting Text Preference Via Structured Comparative Reasoning
Jing Nathan Yan
Tianqi Liu
Justin T Chiu
Jiaming Shen
Zhen Qin
...
Charumathi Lakshmanan
Y. Kurzion
Alexander M. Rush
Jialu Liu
Michael Bendersky
LRM
48
7
0
14 Nov 2023
A Wolf in Sheep's Clothing: Generalized Nested Jailbreak Prompts can
  Fool Large Language Models Easily
A Wolf in Sheep's Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easily
Peng Ding
Jun Kuang
Dan Ma
Xuezhi Cao
Yunsen Xian
Jiajun Chen
Shujian Huang
AAML
36
100
0
14 Nov 2023
Adversarial Preference Optimization: Enhancing Your Alignment via RM-LLM
  Game
Adversarial Preference Optimization: Enhancing Your Alignment via RM-LLM Game
Pengyu Cheng
Yifan Yang
Jian Li
Yong Dai
Tianhao Hu
Peixin Cao
Nan Du
Xiaolong Li
30
29
0
14 Nov 2023
AudioChatLlama: Towards General-Purpose Speech Abilities for LLMs
AudioChatLlama: Towards General-Purpose Speech Abilities for LLMs
Yassir Fathullah
Chunyang Wu
Egor Lakomkin
Ke Li
Junteng Jia
Shangguan Yuan
Jay Mahadeokar
Ozlem Kalinli
Christian Fuegen
Michael Seltzer
LM&MA
MLLM
AuLLM
29
35
0
12 Nov 2023
Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small
  Scorer
Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer
Bowen Tan
Yun Zhu
Lijuan Liu
Eric P. Xing
Zhiting Hu
Jindong Chen
ALM
LRM
47
7
0
12 Nov 2023
Translating Legalese: Enhancing Public Understanding of Court Opinions
  with Legal Summarizers
Translating Legalese: Enhancing Public Understanding of Court Opinions with Legal Summarizers
Elliott Ash
Aniket Kesari
Suresh Naidu
Lena Song
Dominik Stammbach
ELM
32
5
0
11 Nov 2023
Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations
Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations
Joey Hong
Sergey Levine
Anca Dragan
OffRL
LLMAG
52
24
0
09 Nov 2023
A Survey on Hallucination in Large Language Models: Principles,
  Taxonomy, Challenges, and Open Questions
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions
Lei Huang
Weijiang Yu
Weitao Ma
Weihong Zhong
Zhangyin Feng
...
Qianglong Chen
Weihua Peng
Xiaocheng Feng
Bing Qin
Ting Liu
LRM
HILM
61
751
0
09 Nov 2023
First Tragedy, then Parse: History Repeats Itself in the New Era of
  Large Language Models
First Tragedy, then Parse: History Repeats Itself in the New Era of Large Language Models
Naomi Saphra
Eve Fleisig
Kyunghyun Cho
Adam Lopez
LRM
35
8
0
08 Nov 2023
Identifying and Mitigating Vulnerabilities in LLM-Integrated
  Applications
Identifying and Mitigating Vulnerabilities in LLM-Integrated Applications
Fengqing Jiang
Zhangchen Xu
Luyao Niu
Wei Ping
Jinyuan Jia
Bo Li
Radha Poovendran
AAML
25
20
0
07 Nov 2023
Black-Box Prompt Optimization: Aligning Large Language Models without
  Model Training
Black-Box Prompt Optimization: Aligning Large Language Models without Model Training
Jiale Cheng
Xiao Liu
Kehan Zheng
Pei Ke
Hongning Wang
Yuxiao Dong
Jie Tang
Minlie Huang
31
79
0
07 Nov 2023
Reinforcement Learning Fine-tuning of Language Models is Biased Towards
  More Extractable Features
Reinforcement Learning Fine-tuning of Language Models is Biased Towards More Extractable Features
Diogo Cruz
Edoardo Pona
Alex Holness-Tofts
Elias Schmied
Víctor Abia Alonso
Charlie Griffin
B. Cirstea
38
0
0
07 Nov 2023
Can LLMs Follow Simple Rules?
Can LLMs Follow Simple Rules?
Norman Mu
Sarah Chen
Zifan Wang
Sizhe Chen
David Karamardian
Lulwa Aljeraisy
Basel Alomair
Dan Hendrycks
David Wagner
ALM
31
28
0
06 Nov 2023
AI-TA: Towards an Intelligent Question-Answer Teaching Assistant using
  Open-Source LLMs
AI-TA: Towards an Intelligent Question-Answer Teaching Assistant using Open-Source LLMs
Yann Hicke
Anmol Agarwal
Qianou Ma
Paul Denny
AI4Ed
47
24
0
05 Nov 2023
Conditions on Preference Relations that Guarantee the Existence of
  Optimal Policies
Conditions on Preference Relations that Guarantee the Existence of Optimal Policies
Jonathan Colaco Carr
Prakash Panangaden
Doina Precup
31
2
0
03 Nov 2023
Implicit Chain of Thought Reasoning via Knowledge Distillation
Implicit Chain of Thought Reasoning via Knowledge Distillation
Yuntian Deng
Kiran Prasad
Roland Fernandez
P. Smolensky
Vishrav Chaudhary
Stuart M. Shieber
ReLM
LRM
30
44
0
02 Nov 2023
Learning Realistic Traffic Agents in Closed-loop
Learning Realistic Traffic Agents in Closed-loop
Chris Zhang
James Tu
Lunjun Zhang
Kelvin Wong
Simon Suo
R. Urtasun
41
19
0
02 Nov 2023
The Impact of Preference Agreement in Reinforcement Learning from Human
  Feedback: A Case Study in Summarization
The Impact of Preference Agreement in Reinforcement Learning from Human Feedback: A Case Study in Summarization
Sian Gooding
Hassan Mansoor
13
1
0
02 Nov 2023
Blending Reward Functions via Few Expert Demonstrations for Faithful and
  Accurate Knowledge-Grounded Dialogue Generation
Blending Reward Functions via Few Expert Demonstrations for Faithful and Accurate Knowledge-Grounded Dialogue Generation
Wanyu Du
Yangfeng Ji
29
1
0
02 Nov 2023
The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from
  Human Feedback
The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback
Nathan Lambert
Roberto Calandra
ALM
42
32
0
31 Oct 2023
Vanishing Gradients in Reinforcement Finetuning of Language Models
Vanishing Gradients in Reinforcement Finetuning of Language Models
Noam Razin
Hattie Zhou
Omid Saremi
Vimal Thilak
Arwen Bradley
Preetum Nakkiran
Josh Susskind
Etai Littwin
18
7
0
31 Oct 2023
Synthetic Imitation Edit Feedback for Factual Alignment in Clinical
  Summarization
Synthetic Imitation Edit Feedback for Factual Alignment in Clinical Summarization
Prakamya Mishra
Zonghai Yao
Shuwei Chen
Beining Wang
Rohan Mittal
Hong-ye Yu
KELM
ALM
HILM
36
7
0
30 Oct 2023
MoCa: Measuring Human-Language Model Alignment on Causal and Moral
  Judgment Tasks
MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment Tasks
Allen Nie
Yuhui Zhang
Atharva Amdekar
Chris Piech
Tatsunori Hashimoto
Tobias Gerstenberg
43
36
0
30 Oct 2023
Leveraging generative artificial intelligence to simulate student
  learning behavior
Leveraging generative artificial intelligence to simulate student learning behavior
Songlin Xu
Xinyu Zhang
SyDa
AI4Ed
23
12
0
30 Oct 2023
Reward Finetuning for Faster and More Accurate Unsupervised Object
  Discovery
Reward Finetuning for Faster and More Accurate Unsupervised Object Discovery
Katie Z Luo
Zhenzhen Liu
Xiangyu Chen
Yurong You
Sagie Benaim
Cheng Perng Phoo
Mark E. Campbell
Wen Sun
B. Hariharan
Kilian Q. Weinberger
OffRL
35
11
0
29 Oct 2023
Ever Evolving Evaluator (EV3): Towards Flexible and Reliable
  Meta-Optimization for Knowledge Distillation
Ever Evolving Evaluator (EV3): Towards Flexible and Reliable Meta-Optimization for Knowledge Distillation
Li Ding
M. Zoghi
Guy Tennenholtz
Maryam Karimzadehgan
28
0
0
29 Oct 2023
Fine-Tuning Language Models Using Formal Methods Feedback
Fine-Tuning Language Models Using Formal Methods Feedback
Yunhao Yang
N. Bhatt
Tyler Ingebrand
William Ward
Steven Carr
Zhangyang Wang
Ufuk Topcu
34
9
0
27 Oct 2023
Using State-of-the-Art Speech Models to Evaluate Oral Reading Fluency in
  Ghana
Using State-of-the-Art Speech Models to Evaluate Oral Reading Fluency in Ghana
Owen Henkel
Hannah Horne-Robinson
Libby Hills
Bill Roberts
Joshua A. McGrane
19
0
0
26 Oct 2023
Can LLMs Grade Short-Answer Reading Comprehension Questions : An
  Empirical Study with a Novel Dataset
Can LLMs Grade Short-Answer Reading Comprehension Questions : An Empirical Study with a Novel Dataset
Owen Henkel
Libby Hills
Bill Roberts
Joshua A. McGrane
AI4Ed
34
1
0
26 Oct 2023
Beyond MLE: Convex Learning for Text Generation
Beyond MLE: Convex Learning for Text Generation
Chenze Shao
Zhengrui Ma
Min Zhang
Yang Feng
34
3
0
26 Oct 2023
Counterfactual-Augmented Importance Sampling for Semi-Offline Policy
  Evaluation
Counterfactual-Augmented Importance Sampling for Semi-Offline Policy Evaluation
Shengpu Tang
Jenna Wiens
OffRL
CML
18
4
0
26 Oct 2023
Previous
123...181920...272829
Next