Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1909.08593
Cited By
v1
v2 (latest)
Fine-Tuning Language Models from Human Preferences
18 September 2019
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Fine-Tuning Language Models from Human Preferences"
50 / 1,265 papers shown
Title
Getting aligned on representational alignment
Ilia Sucholutsky
Lukas Muttenthaler
Adrian Weller
Andi Peng
Andreea Bobu
...
Thomas Unterthiner
Andrew Kyle Lampinen
Klaus-Robert Muller
M. Toneva
Thomas Griffiths
158
93
0
18 Oct 2023
Quality Diversity through Human Feedback: Towards Open-Ended Diversity-Driven Optimization
Lijie Ding
Jenny Zhang
Jeff Clune
Lee Spector
Joel Lehman
EGVM
112
9
0
18 Oct 2023
Improving Generalization of Alignment with Human Preferences through Group Invariant Learning
Rui Zheng
Wei Shen
Yuan Hua
Wenbin Lai
Shihan Dou
...
Xiao Wang
Haoran Huang
Tao Gui
Qi Zhang
Xuanjing Huang
109
17
0
18 Oct 2023
Investigating Uncertainty Calibration of Aligned Language Models under the Multiple-Choice Setting
Guande He
Peng Cui
Jianfei Chen
Wenbo Hu
Jun Zhu
96
12
0
18 Oct 2023
Eliciting Human Preferences with Language Models
Belinda Z. Li
Alex Tamkin
Noah D. Goodman
Jacob Andreas
RALM
88
51
0
17 Oct 2023
Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging
Joel Jang
Seungone Kim
Bill Yuchen Lin
Yizhong Wang
Jack Hessel
Luke Zettlemoyer
Hannaneh Hajishirzi
Yejin Choi
Prithviraj Ammanabrolu
MoMe
131
153
0
17 Oct 2023
Group Preference Optimization: Few-Shot Alignment of Large Language Models
Siyan Zhao
John Dang
Aditya Grover
83
30
0
17 Oct 2023
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Akari Asai
Zeqiu Wu
Yizhong Wang
Avirup Sil
Hannaneh Hajishirzi
RALM
283
783
0
17 Oct 2023
Large Language Model Prediction Capabilities: Evidence from a Real-World Forecasting Tournament
P. Schoenegger
Peter S. Park
ELM
AI4TS
87
17
0
17 Oct 2023
H2O Open Ecosystem for State-of-the-art Large Language Models
Arno Candel
Jon McKinney
Philipp Singer
Pascal Pfeiffer
Maximilian Jeblick
Chun Ming Lee
Marcos V. Conde
VLM
55
4
0
17 Oct 2023
Compositional preference models for aligning LMs
Dongyoung Go
Tomasz Korbak
Germán Kruszewski
Jos Rozen
Marc Dymetman
95
20
0
17 Oct 2023
Llemma: An Open Language Model For Mathematics
Zhangir Azerbayev
Hailey Schoelkopf
Keiran Paster
Marco Dos Santos
Stephen Marcus McAleer
Albert Q. Jiang
Jia Deng
Stella Biderman
Sean Welleck
CLL
126
303
0
16 Oct 2023
Privacy in Large Language Models: Attacks, Defenses and Future Directions
Haoran Li
Yulin Chen
Jinglong Luo
Yan Kang
Xiaojin Zhang
Qi Hu
Chunkit Chan
Yangqiu Song
PILM
118
45
0
16 Oct 2023
Configuration Validation with Large Language Models
Xinyu Lian
Yinfang Chen
Runxiang Cheng
Jie Huang
Parth Thakkar
Minjia Zhang
Tianyin Xu
73
11
0
15 Oct 2023
From Words and Exercises to Wellness: Farsi Chatbot for Self-Attachment Technique
Sina Elahimanesh
Shayan Salehi
Sara Zahedi Movahed
Lisa Alazraki
Ruoyu Hu
Abbas Edalat
54
0
0
13 Oct 2023
Is Certifying
ℓ
p
\ell_p
ℓ
p
Robustness Still Worthwhile?
Ravi Mangal
Klas Leino
Zifan Wang
Kai Hu
Weicheng Yu
Corina S. Pasareanu
Anupam Datta
Matt Fredrikson
AAML
OOD
84
1
0
13 Oct 2023
Dont Add, dont Miss: Effective Content Preserving Generation from Pre-Selected Text Spans
Aviv Slobodkin
Avi Caciularu
Eran Hirsch
Ido Dagan
50
3
0
13 Oct 2023
Calibrating Likelihoods towards Consistency in Summarization Models
Polina Zablotskaia
Misha Khalman
Rishabh Joshi
Livio Baldini Soares
Shoshana Jakobovits
Joshua Maynez
Shashi Narayan
47
4
0
12 Oct 2023
Octopus: Embodied Vision-Language Programmer from Environmental Feedback
Jingkang Yang
Yuhao Dong
Shuai Liu
Yue Liu
Ziyue Wang
...
Haoran Tan
Jiamu Kang
Yuanhan Zhang
Kaiyang Zhou
Ziwei Liu
LM&Ro
89
49
0
12 Oct 2023
Improving Factual Consistency for Knowledge-Grounded Dialogue Systems via Knowledge Enhancement and Alignment
Boyang Xue
Weichao Wang
Hongru Wang
Fei Mi
Rui Wang
Yasheng Wang
Lifeng Shang
Xin Jiang
Qun Liu
Kam-Fai Wong
KELM
HILM
299
18
0
12 Oct 2023
GROOT: Learning to Follow Instructions by Watching Gameplay Videos
Shaofei Cai
Bowei Zhang
Zihao Wang
Xiaojian Ma
Hoang Trung-Dung
Yitao Liang
161
27
0
12 Oct 2023
The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values
Hannah Rose Kirk
Andrew M. Bean
Bertie Vidgen
Paul Röttger
Scott A. Hale
ALM
113
50
0
11 Oct 2023
Off-Policy Evaluation for Human Feedback
Qitong Gao
Ge Gao
Juncheng Dong
Vahid Tarokh
Min Chi
Miroslav Pajic
OffRL
86
5
0
11 Oct 2023
Benchmarking and Explaining Large Language Model-based Code Generation: A Causality-Centric Approach
Zhenlan Ji
Pingchuan Ma
Zongjie Li
Shuai Wang
80
23
0
10 Oct 2023
Understanding the Effects of RLHF on LLM Generalisation and Diversity
Robert Kirk
Ishita Mediratta
Christoforos Nalmpantis
Jelena Luketina
Eric Hambro
Edward Grefenstette
Roberta Raileanu
AI4CE
ALM
212
150
0
10 Oct 2023
Constructive Large Language Models Alignment with Diverse Feedback
Tianshu Yu
Ting-En Lin
Yuchuan Wu
Min Yang
Fei Huang
Yongbin Li
ALM
104
9
0
10 Oct 2023
SALMON: Self-Alignment with Instructable Reward Models
Zhiqing Sun
Songlin Yang
Hongxin Zhang
Qinhong Zhou
Zhenfang Chen
David D. Cox
Yiming Yang
Chuang Gan
ALM
SyDa
127
41
0
09 Oct 2023
Improving Summarization with Human Edits
Zonghai Yao
Benjamin J Schloss
Sai P. Selvaraj
109
4
0
09 Oct 2023
Aligning Language Models with Human Preferences via a Bayesian Approach
Jiashuo Wang
Haozhao Wang
Shichao Sun
Wenjie Li
ALM
101
25
0
09 Oct 2023
Generative Judge for Evaluating Alignment
Junlong Li
Shichao Sun
Weizhe Yuan
Run-Ze Fan
Hai Zhao
Pengfei Liu
ELM
ALM
112
91
0
09 Oct 2023
Loose lips sink ships: Mitigating Length Bias in Reinforcement Learning from Human Feedback
Wei Shen
Rui Zheng
Wenyu Zhan
Jun Zhao
Shihan Dou
Tao Gui
Qi Zhang
Xuanjing Huang
ALM
116
52
0
08 Oct 2023
How Reliable Are AI-Generated-Text Detectors? An Assessment Framework Using Evasive Soft Prompts
Tharindu Kumarage
Paras Sheth
Raha Moraffah
Joshua Garland
Huan Liu
DeLMO
63
26
0
08 Oct 2023
The Troubling Emergence of Hallucination in Large Language Models -- An Extensive Definition, Quantification, and Prescriptive Remediations
Vipula Rawte
Swagata Chakraborty
Agnibh Pathak
Anubhav Sarkar
S.M. Towhidul Islam Tonmoy
Aman Chadha
Mikel Artetxe
Punit Daniel Simig
HILM
94
131
0
08 Oct 2023
EMO: Earth Mover Distance Optimization for Auto-Regressive Language Modeling
Siyu Ren
Zhiyong Wu
Kenny Q. Zhu
72
4
0
07 Oct 2023
Confronting Reward Model Overoptimization with Constrained RLHF
Ted Moskovitz
Aaditya K. Singh
DJ Strouse
Tuomas Sandholm
Ruslan Salakhutdinov
Anca D. Dragan
Stephen Marcus McAleer
103
55
0
06 Oct 2023
Reward Dropout Improves Control: Bi-objective Perspective on Reinforced LM
Changhun Lee
Chiehyeon Lim
78
0
0
06 Oct 2023
Aligning Text-to-Image Diffusion Models with Reward Backpropagation
Mihir Prabhudesai
Anirudh Goyal
Deepak Pathak
Katerina Fragkiadaki
141
133
0
05 Oct 2023
Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization
Zhanhui Zhou
Jie Liu
Chao Yang
Jing Shao
Yu Liu
Xiangyu Yue
Wanli Ouyang
Yu Qiao
73
61
0
05 Oct 2023
B
\mathcal{B}
B
-Coder: Value-Based Deep Reinforcement Learning for Program Synthesis
Zishun Yu
Yunzhe Tao
Liyu Chen
Tao Sun
Hongxia Yang
85
13
0
04 Oct 2023
Reward Model Ensembles Help Mitigate Overoptimization
Thomas Coste
Usman Anwar
Robert Kirk
David M. Krueger
NoLa
ALM
111
139
0
04 Oct 2023
The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising "Alignment" in Large Language Models
Hannah Rose Kirk
Bertie Vidgen
Paul Röttger
Scott A. Hale
116
4
0
03 Oct 2023
Learning Optimal Advantage from Preferences and Mistaking it for Reward
W. B. Knox
Stephane Hatgis-Kessell
Sigurdur O. Adalgeirsson
Serena Booth
Anca D. Dragan
Peter Stone
S. Niekum
96
13
0
03 Oct 2023
Reinforcement Learning from Automatic Feedback for High-Quality Unit Test Generation
Benjamin Steenhoek
Michele Tufano
Neel Sundaresan
Alexey Svyatkovskiy
OffRL
ALM
151
22
0
03 Oct 2023
Automatic Pair Construction for Contrastive Post-training
Canwen Xu
Corby Rosset
Ethan C. Chau
Luciano Del Corro
Shweti Mahajan
Julian McAuley
Jennifer Neville
Ahmed Hassan Awadallah
Nikhil Rao
ALM
65
4
0
03 Oct 2023
SmartPlay: A Benchmark for LLMs as Intelligent Agents
Yue Wu
Xuan Tang
Tom Michael Mitchell
Yuanzhi Li
ELM
LLMAG
133
73
0
02 Oct 2023
Language Model Decoding as Direct Metrics Optimization
Haozhe Ji
Pei Ke
Hongning Wang
Minlie Huang
61
7
0
02 Oct 2023
No Offense Taken: Eliciting Offensiveness from Language Models
Anugya Srivastava
Rahul Ahuja
Rohith Mukku
51
3
0
02 Oct 2023
A Brief History of Prompt: Leveraging Language Models. (Through Advanced Prompting)
G. Muktadir
SILM
52
10
0
30 Sep 2023
Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment
Tianhao Wu
Banghua Zhu
Ruoyu Zhang
Zhaojin Wen
Kannan Ramchandran
Jiantao Jiao
104
61
0
30 Sep 2023
Network Preference Dynamics using Lattice Theory
Hans Riess
Gregory Henselman-Petrusek
Michael C. Munger
Robert Ghrist
Zachary I. Bell
Michael M. Zavlanos
73
0
0
29 Sep 2023
Previous
1
2
3
...
17
18
19
...
24
25
26
Next