ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.01325
  4. Cited By
Learning to summarize from human feedback
v1v2v3 (latest)

Learning to summarize from human feedback

2 September 2020
Nisan Stiennon
Long Ouyang
Jeff Wu
Daniel M. Ziegler
Ryan J. Lowe
Chelsea Voss
Alec Radford
Dario Amodei
Paul Christiano
    ALM
ArXiv (abs)PDFHTML

Papers citing "Learning to summarize from human feedback"

50 / 1,548 papers shown
Title
Detecting Machine-Generated Texts by Multi-Population Aware Optimization
  for Maximum Mean Discrepancy
Detecting Machine-Generated Texts by Multi-Population Aware Optimization for Maximum Mean Discrepancy
Shuhai Zhang
Yiliao Song
Jiahao Yang
Yuanqing Li
Bo Han
Mingkui Tan
DeLMO
113
8
0
25 Feb 2024
Don't Forget Your Reward Values: Language Model Alignment via
  Value-based Calibration
Don't Forget Your Reward Values: Language Model Alignment via Value-based Calibration
Xin Mao
Fengming Li
Huimin Xu
Wei Zhang
Anh Tuan Luu
ALM
80
7
0
25 Feb 2024
Hal-Eval: A Universal and Fine-grained Hallucination Evaluation
  Framework for Large Vision Language Models
Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models
Chaoya Jiang
Wei Ye
Mengfan Dong
Hongrui Jia
Haiyang Xu
Mingshi Yan
Ji Zhang
Shikun Zhang
VLMMLLM
120
16
0
24 Feb 2024
Co-Supervised Learning: Improving Weak-to-Strong Generalization with
  Hierarchical Mixture of Experts
Co-Supervised Learning: Improving Weak-to-Strong Generalization with Hierarchical Mixture of Experts
Yuejiang Liu
Alexandre Alahi
96
25
0
23 Feb 2024
Fine-Tuning of Continuous-Time Diffusion Models as Entropy-Regularized
  Control
Fine-Tuning of Continuous-Time Diffusion Models as Entropy-Regularized Control
Masatoshi Uehara
Yulai Zhao
Kevin Black
Ehsan Hajiramezanali
Gabriele Scalia
N. Diamant
Alex Tseng
Tommaso Biancalani
Sergey Levine
94
52
0
23 Feb 2024
Machine Unlearning of Pre-trained Large Language Models
Machine Unlearning of Pre-trained Large Language Models
Jin Yao
Eli Chien
Minxin Du
Xinyao Niu
Tianhao Wang
Zezhou Cheng
Xiang Yue
MU
158
51
0
23 Feb 2024
Bias and Volatility: A Statistical Framework for Evaluating Large Language Model's Stereotypes and the Associated Generation Inconsistency
Bias and Volatility: A Statistical Framework for Evaluating Large Language Model's Stereotypes and the Associated Generation Inconsistency
Yiran Liu
Ke Yang
Zehan Qi
Xiao-Yang Liu
Yang Yu
ChengXiang Zhai
120
4
0
23 Feb 2024
Optimizing Language Models for Human Preferences is a Causal Inference
  Problem
Optimizing Language Models for Human Preferences is a Causal Inference Problem
Victoria Lin
Eli Ben-Michael
Louis-Philippe Morency
108
5
0
22 Feb 2024
CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
Zicheng Lin
Zhibin Gou
Tian Liang
Ruilin Luo
Haowei Liu
Yujiu Yang
LRM
109
56
0
22 Feb 2024
Generalizing Reward Modeling for Out-of-Distribution Preference Learning
Generalizing Reward Modeling for Out-of-Distribution Preference Learning
Chen Jia
83
2
0
22 Feb 2024
Chain-of-Thought Unfaithfulness as Disguised Accuracy
Chain-of-Thought Unfaithfulness as Disguised Accuracy
Oliver Bentham
Nathan Stringham
Ana Marasović
LRMHILM
90
16
0
22 Feb 2024
COPR: Continual Human Preference Learning via Optimal Policy
  Regularization
COPR: Continual Human Preference Learning via Optimal Policy Regularization
Han Zhang
Lin Gui
Yu Lei
Yuanzhao Zhai
Yehong Zhang
...
Hui Wang
Yue Yu
Kam-Fai Wong
Bin Liang
Ruifeng Xu
CLL
110
5
0
22 Feb 2024
SYNFAC-EDIT: Synthetic Imitation Edit Feedback for Factual Alignment in
  Clinical Summarization
SYNFAC-EDIT: Synthetic Imitation Edit Feedback for Factual Alignment in Clinical Summarization
Prakamya Mishra
Zonghai Yao
Parth Vashisht
Feiyun Ouyang
Beining Wang
Vidhi Mody
Hong-ye Yu
SyDaMedIm
87
6
0
21 Feb 2024
CriticBench: Evaluating Large Language Models as Critic
CriticBench: Evaluating Large Language Models as Critic
Tian Lan
Wenwei Zhang
Chen Xu
Heyan Huang
Dahua Lin
Kai-xiang Chen
Xian-Ling Mao
ELMAI4MHLRM
86
3
0
21 Feb 2024
Privacy-Preserving Instructions for Aligning Large Language Models
Privacy-Preserving Instructions for Aligning Large Language Models
Da Yu
Peter Kairouz
Sewoong Oh
Zheng Xu
120
25
0
21 Feb 2024
The Lay Person's Guide to Biomedicine: Orchestrating Large Language
  Models
The Lay Person's Guide to Biomedicine: Orchestrating Large Language Models
Zheheng Luo
Qianqian Xie
Sophia Ananiadou
83
0
0
21 Feb 2024
How Important is Domain Specificity in Language Models and Instruction
  Finetuning for Biomedical Relation Extraction?
How Important is Domain Specificity in Language Models and Instruction Finetuning for Biomedical Relation Extraction?
Aviv Brokman
Ramakanth Kavuluru
LM&MAALM
64
3
0
21 Feb 2024
Large Language Models for Data Annotation: A Survey
Large Language Models for Data Annotation: A Survey
Zhen Tan
Dawei Li
Song Wang
Alimohammad Beigi
Bohan Jiang
Amrita Bhattacharjee
Mansooreh Karami
Wenlin Yao
Lu Cheng
Huan Liu
SyDa
138
80
0
21 Feb 2024
Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive
Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive
Arka Pal
Deep Karkhanis
Samuel Dooley
Manley Roberts
Siddartha Naidu
Colin White
OSLM
124
155
0
20 Feb 2024
Bayesian Reward Models for LLM Alignment
Bayesian Reward Models for LLM Alignment
Adam X. Yang
Maxime Robeyns
Thomas Coste
Zhengyan Shi
Jun Wang
Haitham Bou-Ammar
Laurence Aitchison
73
19
0
20 Feb 2024
A Survey on Knowledge Distillation of Large Language Models
A Survey on Knowledge Distillation of Large Language Models
Xiaohan Xu
Ming Li
Chongyang Tao
Tao Shen
Reynold Cheng
Jinyang Li
Can Xu
Dacheng Tao
Dinesh Manocha
KELMVLM
175
135
0
20 Feb 2024
Prompt Stealing Attacks Against Large Language Models
Prompt Stealing Attacks Against Large Language Models
Zeyang Sha
Yang Zhang
SILMAAML
119
35
0
20 Feb 2024
Me LLaMA: Foundation Large Language Models for Medical Applications
Me LLaMA: Foundation Large Language Models for Medical Applications
Qianqian Xie
Qingyu Chen
Aokun Chen
C.A.I. Peng
Yan Hu
...
Huan He
Lucila Ohno-Machido
Yonghui Wu
Hua Xu
Jiang Bian
LM&MAAI4MH
131
4
0
20 Feb 2024
Reflect-RL: Two-Player Online RL Fine-Tuning for LMs
Reflect-RL: Two-Player Online RL Fine-Tuning for LMs
Runlong Zhou
Simon S. Du
Beibin Li
OffRL
91
4
0
20 Feb 2024
Generative AI Security: Challenges and Countermeasures
Generative AI Security: Challenges and Countermeasures
Banghua Zhu
Norman Mu
Jiantao Jiao
David Wagner
AAMLSILM
107
10
0
20 Feb 2024
A Critical Evaluation of AI Feedback for Aligning Large Language Models
A Critical Evaluation of AI Feedback for Aligning Large Language Models
Archit Sharma
Sedrick Scott Keh
Eric Mitchell
Chelsea Finn
Kushal Arora
Thomas Kollar
ALMLLMAG
109
27
0
19 Feb 2024
Emulated Disalignment: Safety Alignment for Large Language Models May
  Backfire!
Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!
Zhanhui Zhou
Jie Liu
Zhichen Dong
Jiaheng Liu
Chao Yang
Wanli Ouyang
Yu Qiao
96
22
0
19 Feb 2024
Amplifying Training Data Exposure through Fine-Tuning with
  Pseudo-Labeled Memberships
Amplifying Training Data Exposure through Fine-Tuning with Pseudo-Labeled Memberships
Myung Gyo Oh
Hong Eun Ahn
L. Park
T.-H. Kwon
MIALMAAML
97
0
0
19 Feb 2024
BIDER: Bridging Knowledge Inconsistency for Efficient
  Retrieval-Augmented LLMs via Key Supporting Evidence
BIDER: Bridging Knowledge Inconsistency for Efficient Retrieval-Augmented LLMs via Key Supporting Evidence
Jiajie Jin
Yutao Zhu
Yujia Zhou
Zhicheng Dou
RALM
104
23
0
19 Feb 2024
Advancing Translation Preference Modeling with RLHF: A Step Towards
  Cost-Effective Solution
Advancing Translation Preference Modeling with RLHF: A Step Towards Cost-Effective Solution
Nuo Xu
Jun Zhao
Can Zu
Sixian Li
Lu Chen
...
Shihan Dou
Wenjuan Qin
Tao Gui
Qi Zhang
Xuanjing Huang
90
7
0
18 Feb 2024
EventRL: Enhancing Event Extraction with Outcome Supervision for Large
  Language Models
EventRL: Enhancing Event Extraction with Outcome Supervision for Large Language Models
Jun Gao
Huan Zhao
Wei Wang
Changlong Yu
Ruifeng Xu
OffRL
67
5
0
18 Feb 2024
Aligning Large Language Models by On-Policy Self-Judgment
Aligning Large Language Models by On-Policy Self-Judgment
Sangkyu Lee
Sungdong Kim
Ashkan Yousefpour
Minjoon Seo
Kang Min Yoo
Youngjae Yu
OSLM
75
12
0
17 Feb 2024
CoLLaVO: Crayon Large Language and Vision mOdel
CoLLaVO: Crayon Large Language and Vision mOdel
Byung-Kwan Lee
Beomchan Park
Chae Won Kim
Yonghyun Ro
VLMMLLM
117
18
0
17 Feb 2024
Humans or LLMs as the Judge? A Study on Judgement Biases
Humans or LLMs as the Judge? A Study on Judgement Biases
Guiming Hardy Chen
Shunian Chen
Ziche Liu
Feng Jiang
Benyou Wang
208
113
0
16 Feb 2024
Can LLMs Speak For Diverse People? Tuning LLMs via Debate to Generate
  Controllable Controversial Statements
Can LLMs Speak For Diverse People? Tuning LLMs via Debate to Generate Controllable Controversial Statements
Ming Li
Jiuhai Chen
Lichang Chen
Dinesh Manocha
150
21
0
16 Feb 2024
Direct Preference Optimization with an Offset
Direct Preference Optimization with an Offset
Afra Amini
Tim Vieira
Ryan Cotterell
136
67
0
16 Feb 2024
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM
  Workflows
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
Ajay Patel
Colin Raffel
Chris Callison-Burch
SyDaAI4CE
77
27
0
16 Feb 2024
Active Preference Optimization for Sample Efficient RLHF
Active Preference Optimization for Sample Efficient RLHF
Nirjhar Das
Souradip Chakraborty
Aldo Pacchiano
Sayak Ray Chowdhury
160
22
0
16 Feb 2024
Rewards-in-Context: Multi-objective Alignment of Foundation Models with
  Dynamic Preference Adjustment
Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment
Rui Yang
Xiaoman Pan
Feng Luo
Shuang Qiu
Han Zhong
Dong Yu
Jianshu Chen
233
83
0
15 Feb 2024
RS-DPO: A Hybrid Rejection Sampling and Direct Preference Optimization
  Method for Alignment of Large Language Models
RS-DPO: A Hybrid Rejection Sampling and Direct Preference Optimization Method for Alignment of Large Language Models
Saeed Khaki
JinJin Li
Lan Ma
Liu Yang
Prathap Ramachandra
89
24
0
15 Feb 2024
InfoRM: Mitigating Reward Hacking in RLHF via Information-Theoretic
  Reward Modeling
InfoRM: Mitigating Reward Hacking in RLHF via Information-Theoretic Reward Modeling
Yuchun Miao
Sen Zhang
Liang Ding
Rong Bao
Lefei Zhang
Dacheng Tao
96
21
0
14 Feb 2024
ICDPO: Effectively Borrowing Alignment Capability of Others via
  In-context Direct Preference Optimization
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization
Feifan Song
Yuxuan Fan
Xin Zhang
Peiyi Wang
Houfeng Wang
66
9
0
14 Feb 2024
Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
Zhichen Dong
Zhanhui Zhou
Chao Yang
Jing Shao
Yu Qiao
ELM
146
68
0
14 Feb 2024
A Dense Reward View on Aligning Text-to-Image Diffusion with Preference
A Dense Reward View on Aligning Text-to-Image Diffusion with Preference
Shentao Yang
Tianqi Chen
Mingyuan Zhou
EGVM
126
30
0
13 Feb 2024
Active Preference Learning for Large Language Models
Active Preference Learning for Large Language Models
William Muldrew
Peter Hayes
Mingtian Zhang
David Barber
92
24
0
12 Feb 2024
Relative Preference Optimization: Enhancing LLM Alignment through
  Contrasting Responses across Identical and Diverse Prompts
Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts
Yueqin Yin
Zhendong Wang
Yi Gu
Hai Huang
Weizhu Chen
Mingyuan Zhou
75
19
0
12 Feb 2024
Suppressing Pink Elephants with Direct Principle Feedback
Suppressing Pink Elephants with Direct Principle Feedback
Louis Castricato
Nathan Lile
Suraj Anand
Hailey Schoelkopf
Siddharth Verma
Stella Biderman
106
12
0
12 Feb 2024
Policy Improvement using Language Feedback Models
Policy Improvement using Language Feedback Models
Victor Zhong
Dipendra Kumar Misra
Xingdi Yuan
Marc-Alexandre Côté
87
11
0
12 Feb 2024
Mercury: A Code Efficiency Benchmark for Code Large Language Models
Mercury: A Code Efficiency Benchmark for Code Large Language Models
Mingzhe Du
Anh Tuan Luu
Bin Ji
Qian Liu
See-Kiong Ng
ALMELMOffRL
96
13
0
12 Feb 2024
ODIN: Disentangled Reward Mitigates Hacking in RLHF
ODIN: Disentangled Reward Mitigates Hacking in RLHF
Lichang Chen
Chen Zhu
Davit Soselia
Jiuhai Chen
Dinesh Manocha
Tom Goldstein
Heng-Chiao Huang
Mohammad Shoeybi
Bryan Catanzaro
AAML
116
66
0
11 Feb 2024
Previous
123...171819...293031
Next