Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.01325
Cited By
v1
v2
v3 (latest)
Learning to summarize from human feedback
2 September 2020
Nisan Stiennon
Long Ouyang
Jeff Wu
Daniel M. Ziegler
Ryan J. Lowe
Chelsea Voss
Alec Radford
Dario Amodei
Paul Christiano
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Learning to summarize from human feedback"
50 / 1,548 papers shown
Title
Detecting Machine-Generated Texts by Multi-Population Aware Optimization for Maximum Mean Discrepancy
Shuhai Zhang
Yiliao Song
Jiahao Yang
Yuanqing Li
Bo Han
Mingkui Tan
DeLMO
113
8
0
25 Feb 2024
Don't Forget Your Reward Values: Language Model Alignment via Value-based Calibration
Xin Mao
Fengming Li
Huimin Xu
Wei Zhang
Anh Tuan Luu
ALM
80
7
0
25 Feb 2024
Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models
Chaoya Jiang
Wei Ye
Mengfan Dong
Hongrui Jia
Haiyang Xu
Mingshi Yan
Ji Zhang
Shikun Zhang
VLM
MLLM
120
16
0
24 Feb 2024
Co-Supervised Learning: Improving Weak-to-Strong Generalization with Hierarchical Mixture of Experts
Yuejiang Liu
Alexandre Alahi
96
25
0
23 Feb 2024
Fine-Tuning of Continuous-Time Diffusion Models as Entropy-Regularized Control
Masatoshi Uehara
Yulai Zhao
Kevin Black
Ehsan Hajiramezanali
Gabriele Scalia
N. Diamant
Alex Tseng
Tommaso Biancalani
Sergey Levine
94
52
0
23 Feb 2024
Machine Unlearning of Pre-trained Large Language Models
Jin Yao
Eli Chien
Minxin Du
Xinyao Niu
Tianhao Wang
Zezhou Cheng
Xiang Yue
MU
158
51
0
23 Feb 2024
Bias and Volatility: A Statistical Framework for Evaluating Large Language Model's Stereotypes and the Associated Generation Inconsistency
Yiran Liu
Ke Yang
Zehan Qi
Xiao-Yang Liu
Yang Yu
ChengXiang Zhai
120
4
0
23 Feb 2024
Optimizing Language Models for Human Preferences is a Causal Inference Problem
Victoria Lin
Eli Ben-Michael
Louis-Philippe Morency
108
5
0
22 Feb 2024
CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
Zicheng Lin
Zhibin Gou
Tian Liang
Ruilin Luo
Haowei Liu
Yujiu Yang
LRM
109
56
0
22 Feb 2024
Generalizing Reward Modeling for Out-of-Distribution Preference Learning
Chen Jia
83
2
0
22 Feb 2024
Chain-of-Thought Unfaithfulness as Disguised Accuracy
Oliver Bentham
Nathan Stringham
Ana Marasović
LRM
HILM
90
16
0
22 Feb 2024
COPR: Continual Human Preference Learning via Optimal Policy Regularization
Han Zhang
Lin Gui
Yu Lei
Yuanzhao Zhai
Yehong Zhang
...
Hui Wang
Yue Yu
Kam-Fai Wong
Bin Liang
Ruifeng Xu
CLL
110
5
0
22 Feb 2024
SYNFAC-EDIT: Synthetic Imitation Edit Feedback for Factual Alignment in Clinical Summarization
Prakamya Mishra
Zonghai Yao
Parth Vashisht
Feiyun Ouyang
Beining Wang
Vidhi Mody
Hong-ye Yu
SyDa
MedIm
87
6
0
21 Feb 2024
CriticBench: Evaluating Large Language Models as Critic
Tian Lan
Wenwei Zhang
Chen Xu
Heyan Huang
Dahua Lin
Kai-xiang Chen
Xian-Ling Mao
ELM
AI4MH
LRM
86
3
0
21 Feb 2024
Privacy-Preserving Instructions for Aligning Large Language Models
Da Yu
Peter Kairouz
Sewoong Oh
Zheng Xu
120
25
0
21 Feb 2024
The Lay Person's Guide to Biomedicine: Orchestrating Large Language Models
Zheheng Luo
Qianqian Xie
Sophia Ananiadou
83
0
0
21 Feb 2024
How Important is Domain Specificity in Language Models and Instruction Finetuning for Biomedical Relation Extraction?
Aviv Brokman
Ramakanth Kavuluru
LM&MA
ALM
64
3
0
21 Feb 2024
Large Language Models for Data Annotation: A Survey
Zhen Tan
Dawei Li
Song Wang
Alimohammad Beigi
Bohan Jiang
Amrita Bhattacharjee
Mansooreh Karami
Wenlin Yao
Lu Cheng
Huan Liu
SyDa
138
80
0
21 Feb 2024
Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive
Arka Pal
Deep Karkhanis
Samuel Dooley
Manley Roberts
Siddartha Naidu
Colin White
OSLM
124
155
0
20 Feb 2024
Bayesian Reward Models for LLM Alignment
Adam X. Yang
Maxime Robeyns
Thomas Coste
Zhengyan Shi
Jun Wang
Haitham Bou-Ammar
Laurence Aitchison
73
19
0
20 Feb 2024
A Survey on Knowledge Distillation of Large Language Models
Xiaohan Xu
Ming Li
Chongyang Tao
Tao Shen
Reynold Cheng
Jinyang Li
Can Xu
Dacheng Tao
Dinesh Manocha
KELM
VLM
175
135
0
20 Feb 2024
Prompt Stealing Attacks Against Large Language Models
Zeyang Sha
Yang Zhang
SILM
AAML
119
35
0
20 Feb 2024
Me LLaMA: Foundation Large Language Models for Medical Applications
Qianqian Xie
Qingyu Chen
Aokun Chen
C.A.I. Peng
Yan Hu
...
Huan He
Lucila Ohno-Machido
Yonghui Wu
Hua Xu
Jiang Bian
LM&MA
AI4MH
131
4
0
20 Feb 2024
Reflect-RL: Two-Player Online RL Fine-Tuning for LMs
Runlong Zhou
Simon S. Du
Beibin Li
OffRL
91
4
0
20 Feb 2024
Generative AI Security: Challenges and Countermeasures
Banghua Zhu
Norman Mu
Jiantao Jiao
David Wagner
AAML
SILM
107
10
0
20 Feb 2024
A Critical Evaluation of AI Feedback for Aligning Large Language Models
Archit Sharma
Sedrick Scott Keh
Eric Mitchell
Chelsea Finn
Kushal Arora
Thomas Kollar
ALM
LLMAG
109
27
0
19 Feb 2024
Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!
Zhanhui Zhou
Jie Liu
Zhichen Dong
Jiaheng Liu
Chao Yang
Wanli Ouyang
Yu Qiao
96
22
0
19 Feb 2024
Amplifying Training Data Exposure through Fine-Tuning with Pseudo-Labeled Memberships
Myung Gyo Oh
Hong Eun Ahn
L. Park
T.-H. Kwon
MIALM
AAML
97
0
0
19 Feb 2024
BIDER: Bridging Knowledge Inconsistency for Efficient Retrieval-Augmented LLMs via Key Supporting Evidence
Jiajie Jin
Yutao Zhu
Yujia Zhou
Zhicheng Dou
RALM
104
23
0
19 Feb 2024
Advancing Translation Preference Modeling with RLHF: A Step Towards Cost-Effective Solution
Nuo Xu
Jun Zhao
Can Zu
Sixian Li
Lu Chen
...
Shihan Dou
Wenjuan Qin
Tao Gui
Qi Zhang
Xuanjing Huang
90
7
0
18 Feb 2024
EventRL: Enhancing Event Extraction with Outcome Supervision for Large Language Models
Jun Gao
Huan Zhao
Wei Wang
Changlong Yu
Ruifeng Xu
OffRL
67
5
0
18 Feb 2024
Aligning Large Language Models by On-Policy Self-Judgment
Sangkyu Lee
Sungdong Kim
Ashkan Yousefpour
Minjoon Seo
Kang Min Yoo
Youngjae Yu
OSLM
75
12
0
17 Feb 2024
CoLLaVO: Crayon Large Language and Vision mOdel
Byung-Kwan Lee
Beomchan Park
Chae Won Kim
Yonghyun Ro
VLM
MLLM
117
18
0
17 Feb 2024
Humans or LLMs as the Judge? A Study on Judgement Biases
Guiming Hardy Chen
Shunian Chen
Ziche Liu
Feng Jiang
Benyou Wang
208
113
0
16 Feb 2024
Can LLMs Speak For Diverse People? Tuning LLMs via Debate to Generate Controllable Controversial Statements
Ming Li
Jiuhai Chen
Lichang Chen
Dinesh Manocha
150
21
0
16 Feb 2024
Direct Preference Optimization with an Offset
Afra Amini
Tim Vieira
Ryan Cotterell
136
67
0
16 Feb 2024
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
Ajay Patel
Colin Raffel
Chris Callison-Burch
SyDa
AI4CE
77
27
0
16 Feb 2024
Active Preference Optimization for Sample Efficient RLHF
Nirjhar Das
Souradip Chakraborty
Aldo Pacchiano
Sayak Ray Chowdhury
160
22
0
16 Feb 2024
Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment
Rui Yang
Xiaoman Pan
Feng Luo
Shuang Qiu
Han Zhong
Dong Yu
Jianshu Chen
233
83
0
15 Feb 2024
RS-DPO: A Hybrid Rejection Sampling and Direct Preference Optimization Method for Alignment of Large Language Models
Saeed Khaki
JinJin Li
Lan Ma
Liu Yang
Prathap Ramachandra
89
24
0
15 Feb 2024
InfoRM: Mitigating Reward Hacking in RLHF via Information-Theoretic Reward Modeling
Yuchun Miao
Sen Zhang
Liang Ding
Rong Bao
Lefei Zhang
Dacheng Tao
96
21
0
14 Feb 2024
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization
Feifan Song
Yuxuan Fan
Xin Zhang
Peiyi Wang
Houfeng Wang
66
9
0
14 Feb 2024
Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
Zhichen Dong
Zhanhui Zhou
Chao Yang
Jing Shao
Yu Qiao
ELM
146
68
0
14 Feb 2024
A Dense Reward View on Aligning Text-to-Image Diffusion with Preference
Shentao Yang
Tianqi Chen
Mingyuan Zhou
EGVM
126
30
0
13 Feb 2024
Active Preference Learning for Large Language Models
William Muldrew
Peter Hayes
Mingtian Zhang
David Barber
92
24
0
12 Feb 2024
Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts
Yueqin Yin
Zhendong Wang
Yi Gu
Hai Huang
Weizhu Chen
Mingyuan Zhou
75
19
0
12 Feb 2024
Suppressing Pink Elephants with Direct Principle Feedback
Louis Castricato
Nathan Lile
Suraj Anand
Hailey Schoelkopf
Siddharth Verma
Stella Biderman
106
12
0
12 Feb 2024
Policy Improvement using Language Feedback Models
Victor Zhong
Dipendra Kumar Misra
Xingdi Yuan
Marc-Alexandre Côté
87
11
0
12 Feb 2024
Mercury: A Code Efficiency Benchmark for Code Large Language Models
Mingzhe Du
Anh Tuan Luu
Bin Ji
Qian Liu
See-Kiong Ng
ALM
ELM
OffRL
96
13
0
12 Feb 2024
ODIN: Disentangled Reward Mitigates Hacking in RLHF
Lichang Chen
Chen Zhu
Davit Soselia
Jiuhai Chen
Dinesh Manocha
Tom Goldstein
Heng-Chiao Huang
Mohammad Shoeybi
Bryan Catanzaro
AAML
116
66
0
11 Feb 2024
Previous
1
2
3
...
17
18
19
...
29
30
31
Next