Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.01325
Cited By
v1
v2
v3 (latest)
Learning to summarize from human feedback
2 September 2020
Nisan Stiennon
Long Ouyang
Jeff Wu
Daniel M. Ziegler
Ryan J. Lowe
Chelsea Voss
Alec Radford
Dario Amodei
Paul Christiano
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Learning to summarize from human feedback"
50 / 1,548 papers shown
Title
Parrot: Enhancing Multi-Turn Instruction Following for Large Language Models
Yuchong Sun
Che Liu
Kun Zhou
Jinwen Huang
Ruihua Song
Xin Zhao
Fuzheng Zhang
Di Zhang
Kun Gai
LRM
76
11
0
11 Oct 2023
Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation
Yangsibo Huang
Samyak Gupta
Mengzhou Xia
Kai Li
Danqi Chen
AAML
84
312
0
10 Oct 2023
Teaching Language Models to Hallucinate Less with Synthetic Tasks
Erik Jones
Hamid Palangi
Clarisse Simoes
Varun Chandrasekaran
Subhabrata Mukherjee
Arindam Mitra
Ahmed Hassan Awadallah
Ece Kamar
HILM
93
27
0
10 Oct 2023
Understanding the Effects of RLHF on LLM Generalisation and Diversity
Robert Kirk
Ishita Mediratta
Christoforos Nalmpantis
Jelena Luketina
Eric Hambro
Edward Grefenstette
Roberta Raileanu
AI4CE
ALM
217
150
0
10 Oct 2023
Constructive Large Language Models Alignment with Diverse Feedback
Tianshu Yu
Ting-En Lin
Yuchuan Wu
Min Yang
Fei Huang
Yongbin Li
ALM
108
9
0
10 Oct 2023
Diversity from Human Feedback
Ren-Jian Wang
Ke Xue
Yutong Wang
Peng Yang
Haobo Fu
Qiang Fu
Chao Qian
88
3
0
10 Oct 2023
Factual and Personalized Recommendations using Language Models and Reinforcement Learning
Jihwan Jeong
Yinlam Chow
Guy Tennenholtz
Chih-Wei Hsu
Azamat Tulepbergenov
Mohammad Ghavamzadeh
Craig Boutilier
88
4
0
09 Oct 2023
SALMON: Self-Alignment with Instructable Reward Models
Zhiqing Sun
Songlin Yang
Hongxin Zhang
Qinhong Zhou
Zhenfang Chen
David D. Cox
Yiming Yang
Chuang Gan
ALM
SyDa
138
42
0
09 Oct 2023
Improving Summarization with Human Edits
Zonghai Yao
Benjamin J Schloss
Sai P. Selvaraj
120
5
0
09 Oct 2023
Aligning Language Models with Human Preferences via a Bayesian Approach
Jiashuo Wang
Haozhao Wang
Shichao Sun
Wenjie Li
ALM
108
25
0
09 Oct 2023
Regulation and NLP (RegNLP): Taming Large Language Models
Catalina Goanta
Nikolaos Aletras
Ilias Chalkidis
S. Ranchordas
Gerasimos Spanakis
AILaw
50
4
0
09 Oct 2023
Generative Judge for Evaluating Alignment
Junlong Li
Shichao Sun
Weizhe Yuan
Run-Ze Fan
Hai Zhao
Pengfei Liu
ELM
ALM
119
91
0
09 Oct 2023
SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF
Yi Dong
Zhilin Wang
Makesh Narsimhan Sreedhar
Xianchao Wu
Oleksii Kuchaiev
ALM
LLMSV
109
73
0
09 Oct 2023
Negative Object Presence Evaluation (NOPE) to Measure Object Hallucination in Vision-Language Models
Holy Lovenia
Wenliang Dai
Samuel Cahyawijaya
Ziwei Ji
Pascale Fung
MLLM
110
53
0
09 Oct 2023
Loose lips sink ships: Mitigating Length Bias in Reinforcement Learning from Human Feedback
Wei Shen
Rui Zheng
Wenyu Zhan
Jun Zhao
Shihan Dou
Tao Gui
Qi Zhang
Xuanjing Huang
ALM
127
52
0
08 Oct 2023
Crystal: Introspective Reasoners Reinforced with Self-Feedback
Jiacheng Liu
Ramakanth Pasunuru
Hannaneh Hajishirzi
Yejin Choi
Asli Celikyilmaz
LRM
ReLM
79
24
0
07 Oct 2023
Confronting Reward Model Overoptimization with Constrained RLHF
Ted Moskovitz
Aaditya K. Singh
DJ Strouse
Tuomas Sandholm
Ruslan Salakhutdinov
Anca D. Dragan
Stephen Marcus McAleer
103
55
0
06 Oct 2023
Reward Dropout Improves Control: Bi-objective Perspective on Reinforced LM
Changhun Lee
Chiehyeon Lim
85
0
0
06 Oct 2023
LLM4DV: Using Large Language Models for Hardware Test Stimuli Generation
Zixi Zhang
Greg Chadwick
Hugo McNally
Yiren Zhao
Robert D. Mullins
Jianyi Cheng
Robert Mullins
Yiren Zhao
122
25
0
06 Oct 2023
Aligning Text-to-Image Diffusion Models with Reward Backpropagation
Mihir Prabhudesai
Anirudh Goyal
Deepak Pathak
Katerina Fragkiadaki
149
133
0
05 Oct 2023
A Long Way to Go: Investigating Length Correlations in RLHF
Prasann Singhal
Tanya Goyal
Jiacheng Xu
Greg Durrett
163
161
0
05 Oct 2023
Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization
Zhanhui Zhou
Jie Liu
Chao Yang
Jing Shao
Yu Liu
Xiangyu Yue
Wanli Ouyang
Yu Qiao
85
62
0
05 Oct 2023
Misusing Tools in Large Language Models With Visual Adversarial Examples
Xiaohan Fu
Zihan Wang
Shuheng Li
Rajesh K. Gupta
Niloofar Mireshghallah
Taylor Berg-Kirkpatrick
Earlence Fernandes
AAML
85
27
0
04 Oct 2023
B
\mathcal{B}
B
-Coder: Value-Based Deep Reinforcement Learning for Program Synthesis
Zishun Yu
Yunzhe Tao
Liyu Chen
Tao Sun
Hongxia Yang
90
13
0
04 Oct 2023
JsonTuning: Towards Generalizable, Robust, and Controllable Instruction Tuning
Chang Gao
Wenxuan Zhang
Guizhen Chen
Wai Lam
227
6
0
04 Oct 2023
Reward Model Ensembles Help Mitigate Overoptimization
Thomas Coste
Usman Anwar
Robert Kirk
David M. Krueger
NoLa
ALM
120
139
0
04 Oct 2023
The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising "Alignment" in Large Language Models
Hannah Rose Kirk
Bertie Vidgen
Paul Röttger
Scott A. Hale
118
4
0
03 Oct 2023
Reinforcement Learning from Automatic Feedback for High-Quality Unit Test Generation
Benjamin Steenhoek
Michele Tufano
Neel Sundaresan
Alexey Svyatkovskiy
OffRL
ALM
157
22
0
03 Oct 2023
Automatic Pair Construction for Contrastive Post-training
Canwen Xu
Corby Rosset
Ethan C. Chau
Luciano Del Corro
Shweti Mahajan
Julian McAuley
Jennifer Neville
Ahmed Hassan Awadallah
Nikhil Rao
ALM
74
4
0
03 Oct 2023
TWIZ-v2: The Wizard of Multimodal Conversational-Stimulus
Rafael Ferreira
Diogo Tavares
Diogo Glória-Silva
Rodrigo Valerio
João Bordalo
Ines Simoes
Vasco Ramos
David Semedo
João Magalhães
52
4
0
03 Oct 2023
AlignDiff: Aligning Diverse Human Preferences via Behavior-Customisable Diffusion Model
Zibin Dong
Yifu Yuan
Jianye Hao
Fei Ni
Yao Mu
Yan Zheng
Yujing Hu
Tangjie Lv
Changjie Fan
Zhipeng Hu
105
32
0
03 Oct 2023
Avalon's Game of Thoughts: Battle Against Deception through Recursive Contemplation
Shenzhi Wang
Chang Liu
Zilong Zheng
Siyuan Qi
Shuo Chen
Qisen Yang
Andrew Zhao
Chaofei Wang
Shiji Song
Gao Huang
LLMAG
122
70
0
02 Oct 2023
Tool-Augmented Reward Modeling
Lei Li
Yekun Chai
Shuohuan Wang
Yu Sun
Hao Tian
Ningyu Zhang
Hua Wu
OffRL
115
14
0
02 Oct 2023
Enabling Language Models to Implicitly Learn Self-Improvement
Ziqi Wang
Le Hou
Tianjian Lu
Yuexin Wu
Yunxuan Li
Hongkun Yu
Heng Ji
ReLM
LRM
67
6
0
02 Oct 2023
No Offense Taken: Eliciting Offensiveness from Language Models
Anugya Srivastava
Rahul Ahuja
Rohith Mukku
58
3
0
02 Oct 2023
Parameter-Efficient Tuning Helps Language Model Alignment
Tianci Xue
Ziqi Wang
Heng Ji
ALM
77
7
0
01 Oct 2023
Adapting LLM Agents with Universal Feedback in Communication
Kuan-Chieh Wang
Yadong Lu
Michael Santacroce
Yeyun Gong
Chao Zhang
Yelong Shen
LLMAG
84
9
0
01 Oct 2023
From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning
Xuansheng Wu
Wenlin Yao
Jianshu Chen
Xiaoman Pan
Xiaoyang Wang
Ninghao Liu
Dong Yu
LRM
97
33
0
30 Sep 2023
Consistent Aggregation of Objectives with Diverse Time Preferences Requires Non-Markovian Rewards
Silviu Pitis
70
6
0
30 Sep 2023
Directly Fine-Tuning Diffusion Models on Differentiable Rewards
Amita Gajewar
Paul Vicol
G. Bansal
David J Fleet
128
177
0
29 Sep 2023
Qwen Technical Report
Jinze Bai
Shuai Bai
Yunfei Chu
Zeyu Cui
Kai Dang
...
Zhenru Zhang
Chang Zhou
Jingren Zhou
Xiaohuan Zhou
Tianhang Zhu
OSLM
375
1,924
0
28 Sep 2023
Large Language Model Alignment: A Survey
Tianhao Shen
Renren Jin
Yufei Huang
Chuang Liu
Weilong Dong
Zishan Guo
Xinwei Wu
Yan Liu
Deyi Xiong
LM&MA
115
207
0
26 Sep 2023
Art or Artifice? Large Language Models and the False Promise of Creativity
Tuhin Chakrabarty
Philippe Laban
Divyansh Agarwal
Smaranda Muresan
Chien-Sheng Wu
102
136
0
25 Sep 2023
Aligning Large Multimodal Models with Factually Augmented RLHF
Zhiqing Sun
Sheng Shen
Shengcao Cao
Haotian Liu
Chunyuan Li
...
Liangyan Gui
Yu-Xiong Wang
Yiming Yang
Kurt Keutzer
Trevor Darrell
VLM
146
396
0
25 Sep 2023
MentaLLaMA: Interpretable Mental Health Analysis on Social Media with Large Language Models
Kailai Yang
Tianlin Zhang
Zi-Zhou Kuang
Qianqian Xie
Jimin Huang
Sophia Ananiadou
AI4MH
99
58
0
24 Sep 2023
Frustrated with Code Quality Issues? LLMs can Help!
Nalin Wadhwa
Jui Pradhan
Atharv Sonwane
Surya Prakash Sahu
Nagarajan Natarajan
Aditya Kanade
Suresh Parthasarathy
S. Rajamani
78
6
0
22 Sep 2023
AceGPT, Localizing Large Language Models in Arabic
Huang Huang
Fei Yu
Jianqing Zhu
Xuening Sun
Hao Cheng
...
Lian Zhang
Ruoyu Sun
Xiang Wan
Haizhou Li
Jinchao Xu
162
57
0
21 Sep 2023
Are Large Language Models Really Robust to Word-Level Perturbations?
Haoyu Wang
Guozheng Ma
Cong Yu
Ning Gui
Linrui Zhang
...
Sen Zhang
Li Shen
Xueqian Wang
Peilin Zhao
Dacheng Tao
KELM
109
24
0
20 Sep 2023
Toward Unified Controllable Text Generation via Regular Expression Instruction
Xin Zheng
Hongyu Lin
Xianpei Han
Le Sun
108
5
0
19 Sep 2023
Drive as You Speak: Enabling Human-Like Interaction with Large Language Models in Autonomous Vehicles
Can Cui
Yunsheng Ma
Xu Cao
Wenqian Ye
Ziran Wang
124
115
0
19 Sep 2023
Previous
1
2
3
...
22
23
24
...
29
30
31
Next