Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1909.08593
Cited By
v1
v2 (latest)
Fine-Tuning Language Models from Human Preferences
18 September 2019
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Fine-Tuning Language Models from Human Preferences"
50 / 1,265 papers shown
Title
ReMask: A Robust Information-Masking Approach for Domain Counterfactual Generation
Pengfei Hong
Rishabh Bhardwaj
Navonil Majumdar
Somak Aditya
Soujanya Poria
AAML
47
0
0
04 May 2023
"Oops, Did I Just Say That?" Testing and Repairing Unethical Suggestions of Large Language Models with Suggest-Critique-Reflect Process
Anna Glazkova
Zongjie Li
Michael Kadantsev
Maksim Glazkov
KELM
86
14
0
04 May 2023
Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation
Patrick Fernandes
Aman Madaan
Emmy Liu
António Farinhas
Pedro Henrique Martins
...
José G. C. de Souza
Shuyan Zhou
Tongshuang Wu
Graham Neubig
André F. T. Martins
ALM
192
59
0
01 May 2023
CONSCENDI: A Contrastive and Scenario-Guided Distillation Approach to Guardrail Models for Virtual Assistants
A. Sun
Varun Nair
Elliot Schumacher
Anitha Kannan
80
3
0
27 Apr 2023
ChatGPT vs State-of-the-Art Models: A Benchmarking Study in Keyphrase Generation Task
Roberto Martínez-Cruz
Alvaro J. López-López
J. Portela
106
22
0
27 Apr 2023
Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning
Cheng Lu
Huayu Chen
Jianfei Chen
Hang Su
Chongxuan Li
Jun Zhu
DiffM
OffRL
143
75
0
25 Apr 2023
Joint Repetition Suppression and Content Moderation of Large Language Models
Minghui Zhang
Alex Sokolov
Weixin Cai
Si-Qing Chen
45
1
0
20 Apr 2023
Stochastic Parrots Looking for Stochastic Parrots: LLMs are Easy to Fine-Tune and Hard to Detect with other LLMs
Da Silva Gameiro Henrique
Andrei Kucharavy
R. Guerraoui
DeLMO
76
8
0
18 Apr 2023
Tool Learning with Foundation Models
Yujia Qin
Shengding Hu
Yankai Lin
Weize Chen
Ning Ding
...
Cheng Yang
Tongshuang Wu
Heng Ji
Zhiyuan Liu
Maosong Sun
146
222
0
17 Apr 2023
A Comparative Study between Full-Parameter and LoRA-based Fine-Tuning on Chinese Instruction Data for Instruction Following Large Language Model
Xianghui Sun
Yunjie Ji
Baochang Ma
Xiangang Li
ALM
85
19
0
17 Apr 2023
Towards Better Instruction Following Language Models for Chinese: Investigating the Impact of Training Data and Evaluation
Yunjie Ji
Yan Gong
Yong Deng
Yiping Peng
Qiang Niu
Baochang Ma
Xiangang Li
ALM
ELM
102
25
0
16 Apr 2023
OpenAssistant Conversations -- Democratizing Large Language Model Alignment
Andreas Kopf
Yannic Kilcher
Dimitri von Rutte
Sotiris Anagnostidis
Zhi Rui Tam
...
Arnav Dantuluri
Andrew Maguire
Christoph Schuhmann
Huu Nguyen
A. Mattick
ALM
LM&MA
151
640
0
14 Apr 2023
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
Hanze Dong
Wei Xiong
Deepanshu Goyal
Yihan Zhang
Winnie Chow
Boyao Wang
Shizhe Diao
Jipeng Zhang
Kashun Shum
Tong Zhang
ALM
153
470
0
13 Apr 2023
ChatGPT-4 Outperforms Experts and Crowd Workers in Annotating Political Twitter Messages with Zero-Shot Learning
Petter Törnberg
AI4MH
79
149
0
13 Apr 2023
Are LLMs All You Need for Task-Oriented Dialogue?
Vojtvech Hudevcek
Ondrej Dusek
94
62
0
13 Apr 2023
Language Instructed Reinforcement Learning for Human-AI Coordination
Hengyuan Hu
Dorsa Sadigh
LM&Ro
96
64
0
13 Apr 2023
ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation
Jiazheng Xu
Xiao Liu
Yuchen Wu
Yuxuan Tong
Qinkai Li
Ming Ding
Jie Tang
Yuxiao Dong
159
413
0
12 Apr 2023
Emergent autonomous scientific research capabilities of large language models
Daniil A. Boiko
R. MacKnight
Gabe Gomes
ELM
LM&Ro
AI4CE
LLMAG
164
128
0
11 Apr 2023
RRHF: Rank Responses to Align Language Models with Human Feedback without tears
Zheng Yuan
Hongyi Yuan
Chuanqi Tan
Wei Wang
Songfang Huang
Feiran Huang
ALM
183
385
0
11 Apr 2023
Learning a Universal Human Prior for Dexterous Manipulation from Human Preference
Zihan Ding
Yuanpei Chen
Allen Z. Ren
S. Gu
Qianxu Wang
Hao Dong
Chi Jin
84
10
0
10 Apr 2023
OpenAGI: When LLM Meets Domain Experts
Yingqiang Ge
Wenyue Hua
Kai Mei
Jianchao Ji
Juntao Tan
Shuyuan Xu
Zelong Li
Yongfeng Zhang
VLM
LRM
118
232
0
10 Apr 2023
Should ChatGPT be Biased? Challenges and Risks of Bias in Large Language Models
Emilio Ferrara
SILM
121
264
0
07 Apr 2023
Beyond Summarization: Designing AI Support for Real-World Expository Writing Tasks
Zejiang Shen
Tal August
Pao Siangliulue
Kyle Lo
Jonathan Bragg
Jeff Hammerbacher
Doug Downey
Joseph Chee Chang
David Sontag
ELM
67
19
0
05 Apr 2023
REFINER: Reasoning Feedback on Intermediate Representations
Debjit Paul
Mete Ismayilzada
Maxime Peyrard
Beatriz Borges
Antoine Bosselut
Robert West
Boi Faltings
ReLM
LRM
134
182
0
04 Apr 2023
The Vector Grounding Problem
Dimitri Coelho Mollo
Raphael Milliere
146
28
0
04 Apr 2023
Eight Things to Know about Large Language Models
Sam Bowman
ALM
103
116
0
02 Apr 2023
Keep the Conversation Going: Fixing 162 out of 337 bugs for
0.42
e
a
c
h
u
s
i
n
g
C
h
a
t
G
P
T
0.42 each using ChatGPT
0.42
e
a
c
h
u
s
in
g
C
ha
tGPT
Chun Xia
Lingming Zhang
KELM
LRM
118
121
0
01 Apr 2023
Evaluating Large Language Models on a Highly-specialized Topic, Radiation Oncology Physics
J. Holmes
Zheng Liu
Hua Zhou
Yuzhen Ding
Terence T. Sio
...
Jonathan B. Ashman
Xiang Li
Tianming Liu
Jiajian Shen
Wen Liu
LM&MA
AI4CE
ELM
94
124
0
01 Apr 2023
Aligning a medium-size GPT model in English to a small closed domain in Spanish
Oscar R. Navarrete-Parra
Víctor Uc Cetina
Jorge Reyes-Magaña
37
0
0
30 Mar 2023
Humans in Humans Out: On GPT Converging Toward Common Sense in both Success and Failure
Philipp E. Koralus
Vincent Wang-Ma'scianica
LRM
47
13
0
30 Mar 2023
Training Language Models with Language Feedback at Scale
Jérémy Scheurer
Jon Ander Campos
Tomasz Korbak
Jun Shern Chan
Angelica Chen
Kyunghyun Cho
Ethan Perez
ALM
107
107
0
28 Mar 2023
Improving Code Generation by Training with Natural Language Feedback
Angelica Chen
Jérémy Scheurer
Tomasz Korbak
Jon Ander Campos
Jun Shern Chan
Samuel R. Bowman
Kyunghyun Cho
Ethan Perez
SyDa
ALM
AI4CE
99
78
0
28 Mar 2023
Foundation Models and Fair Use
Peter Henderson
Xuechen Li
Dan Jurafsky
Tatsunori Hashimoto
Mark A. Lemley
Percy Liang
85
126
0
28 Mar 2023
On the Creativity of Large Language Models
Giorgio Franceschelli
Mirco Musolesi
196
60
0
27 Mar 2023
Exploring the Impact of Instruction Data Scaling on Large Language Models: An Empirical Study on Real-World Use Cases
Yunjie Ji
Yong Deng
Yan Gong
Yiping Peng
Qiang Niu
Lefei Zhang
Baochang Ma
Xiangang Li
ALM
70
97
0
26 Mar 2023
SPEC: Summary Preference Decomposition for Low-Resource Abstractive Summarization
Yi-Syuan Chen
Yun-Zhu Song
Hong-Han Shuai
64
6
0
24 Mar 2023
Matryoshka Policy Gradient for Entropy-Regularized RL: Convergence and Global Optimality
François Ged
M. H. Veiga
81
0
0
22 Mar 2023
Large Language Models Can Be Used to Estimate the Latent Positions of Politicians
Patrick Y. Wu
Jonathan Nagler
Joshua A. Tucker
Solomon Messing
175
28
0
21 Mar 2023
Exploring ChatGPT's Ability to Rank Content: A Preliminary Study on Consistency with Human Preferences
Yunjie Ji
Yan Gong
Yiping Peng
Chao Ni
Peiyan Sun
Dongyu Pan
Baochang Ma
Xiangang Li
ELM
ALM
AI4MH
76
38
0
14 Mar 2023
Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback
Hannah Rose Kirk
Bertie Vidgen
Paul Röttger
Scott A. Hale
106
107
0
09 Mar 2023
disco: a toolkit for Distributional Control of Generative Models
Germán Kruszewski
Jos Rozen
Marc Dymetman
59
4
0
08 Mar 2023
Perspectives on the Social Impacts of Reinforcement Learning with Human Feedback
Gabrielle K. Liu
OffRL
116
21
0
06 Mar 2023
Active Reward Learning from Multiple Teachers
Peter Barnett
Rachel Freedman
Justin Svegliato
Stuart J. Russell
66
15
0
02 Mar 2023
R-U-SURE? Uncertainty-Aware Code Suggestions By Maximizing Utility Across Random User Intents
Daniel D. Johnson
Daniel Tarlow
Christian J. Walder
77
6
0
01 Mar 2023
Aligning Text-to-Image Models using Human Feedback
Kimin Lee
Hao Liu
Moonkyung Ryu
Olivia Watkins
Yuqing Du
Craig Boutilier
Pieter Abbeel
Mohammad Ghavamzadeh
S. Gu
EGVM
136
285
0
23 Feb 2023
Guiding Large Language Models via Directional Stimulus Prompting
Zekun Li
Baolin Peng
Pengcheng He
Michel Galley
Jianfeng Gao
Xi Yan
LLMAG
LRM
LM&Ro
132
101
0
22 Feb 2023
BadGPT: Exploring Security Vulnerabilities of ChatGPT via Backdoor Attacks to InstructGPT
Jiawen Shi
Yixin Liu
Pan Zhou
Lichao Sun
SILM
66
83
0
21 Feb 2023
Fantastic Rewards and How to Tame Them: A Case Study on Reward Learning for Task-oriented Dialogue Systems
Yihao Feng
Shentao Yang
Shujian Zhang
Jianguo Zhang
Caiming Xiong
Mi Zhou
Haiquan Wang
OffRL
102
25
0
20 Feb 2023
Pretraining Language Models with Human Preferences
Tomasz Korbak
Kejian Shi
Angelica Chen
Rasika Bhalerao
C. L. Buckley
Jason Phang
Sam Bowman
Ethan Perez
ALM
SyDa
102
231
0
16 Feb 2023
Aligning Language Models with Preferences through f-divergence Minimization
Dongyoung Go
Tomasz Korbak
Germán Kruszewski
Jos Rozen
Nahyeon Ryu
Marc Dymetman
107
76
0
16 Feb 2023
Previous
1
2
3
...
21
22
23
24
25
26
Next