Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1909.08593
Cited By
v1
v2 (latest)
Fine-Tuning Language Models from Human Preferences
18 September 2019
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Fine-Tuning Language Models from Human Preferences"
50 / 1,265 papers shown
Title
Aligning Crowd Feedback via Distributional Preference Reward Modeling
Dexun Li
Cong Zhang
Kuicai Dong
Derrick-Goh-Xin Deik
Ruiming Tang
Yong Liu
97
17
0
15 Feb 2024
InfoRM: Mitigating Reward Hacking in RLHF via Information-Theoretic Reward Modeling
Yuchun Miao
Sen Zhang
Liang Ding
Rong Bao
Lefei Zhang
Dacheng Tao
96
21
0
14 Feb 2024
MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with Diverse Human Preferences
Souradip Chakraborty
Jiahao Qiu
Hui Yuan
Alec Koppel
Furong Huang
Dinesh Manocha
Amrit Singh Bedi
Mengdi Wang
ALM
98
60
0
14 Feb 2024
Reinforcement Learning from Human Feedback with Active Queries
Kaixuan Ji
Jiafan He
Quanquan Gu
102
19
0
14 Feb 2024
GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements
Alex Havrilla
Sharath Raparthy
Christoforus Nalmpantis
Jane Dwivedi-Yu
Maksym Zhuravinskyi
Eric Hambro
Roberta Railneau
ReLM
LRM
92
65
0
13 Feb 2024
Measuring and Controlling Instruction (In)Stability in Language Model Dialogs
Kenneth Li
Tianle Liu
Naomi Bashkansky
David Bau
Fernanda Viégas
Hanspeter Pfister
Martin Wattenberg
96
12
0
13 Feb 2024
PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models
Fei Deng
Qifei Wang
Wei Wei
Matthias Grundmann
Tingbo Hou
EGVM
84
21
0
13 Feb 2024
A Dense Reward View on Aligning Text-to-Image Diffusion with Preference
Shentao Yang
Tianqi Chen
Mingyuan Zhou
EGVM
126
30
0
13 Feb 2024
Active Preference Learning for Large Language Models
William Muldrew
Peter Hayes
Mingtian Zhang
David Barber
86
24
0
12 Feb 2024
Large Language Models as Agents in Two-Player Games
Yang Liu
Peng Sun
Hang Li
LLMAG
73
4
0
12 Feb 2024
Refined Direct Preference Optimization with Synthetic Data for Behavioral Alignment of LLMs
Víctor Gallego
SyDa
55
7
0
12 Feb 2024
Policy Improvement using Language Feedback Models
Victor Zhong
Dipendra Kumar Misra
Xingdi Yuan
Marc-Alexandre Côté
84
11
0
12 Feb 2024
AI-Augmented Predictions: LLM Assistants Improve Human Forecasting Accuracy
P. Schoenegger
Peter S. Park
Ezra Karger
P. Tetlock
106
18
0
12 Feb 2024
Mercury: A Code Efficiency Benchmark for Code Large Language Models
Mingzhe Du
Anh Tuan Luu
Bin Ji
Qian Liu
See-Kiong Ng
ALM
ELM
OffRL
91
13
0
12 Feb 2024
Differentially Private Zeroth-Order Methods for Scalable Large Language Model Finetuning
Zhicheng Liu
Jian Lou
Wenxuan Bao
Yihan Hu
Baochun Li
Zhan Qin
K. Ren
122
10
0
12 Feb 2024
Show Me How It's Done: The Role of Explanations in Fine-Tuning Language Models
Mohamad Ballout
U. Krumnack
Gunther Heidemann
Kai-Uwe Kuehnberger
LRM
98
4
0
12 Feb 2024
ODIN: Disentangled Reward Mitigates Hacking in RLHF
Lichang Chen
Chen Zhu
Davit Soselia
Jiuhai Chen
Dinesh Manocha
Tom Goldstein
Heng-Chiao Huang
Mohammad Shoeybi
Bryan Catanzaro
AAML
116
66
0
11 Feb 2024
Online Iterative Reinforcement Learning from Human Feedback with General Preference Model
Chen Ye
Wei Xiong
Yuheng Zhang
Nan Jiang
Tong Zhang
OffRL
85
15
0
11 Feb 2024
LiFi: Lightweight Controlled Text Generation with Fine-Grained Control Codes
Chufan Shi
Deng Cai
Yujiu Yang
160
3
0
10 Feb 2024
Corruption Robust Offline Reinforcement Learning with Human Feedback
Debmalya Mandal
Andi Nika
Parameswaran Kamalaruban
Adish Singla
Goran Radanović
OffRL
93
11
0
09 Feb 2024
V-STaR: Training Verifiers for Self-Taught Reasoners
Arian Hosseini
Xingdi Yuan
Nikolay Malkin
Rameswar Panda
Alessandro Sordoni
Rishabh Agarwal
ReLM
LRM
112
137
0
09 Feb 2024
Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement
Muning Wen
Junwei Liao
Cheng Deng
Jun Wang
Weinan Zhang
Ying Wen
75
3
0
09 Feb 2024
Noise Contrastive Alignment of Language Models with Explicit Rewards
Huayu Chen
Guande He
Lifan Yuan
Ganqu Cui
Hang Su
Jun Zhu
110
56
0
08 Feb 2024
Implicit Diffusion: Efficient Optimization through Stochastic Sampling
Pierre Marion
Anna Korba
Peter Bartlett
Mathieu Blondel
Valentin De Bortoli
Arnaud Doucet
Felipe Llinares-López
Courtney Paquette
Quentin Berthet
154
15
0
08 Feb 2024
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Boyi Wei
Kaixuan Huang
Yangsibo Huang
Tinghao Xie
Xiangyu Qi
Mengzhou Xia
Prateek Mittal
Mengdi Wang
Peter Henderson
AAML
151
118
0
07 Feb 2024
Pedagogical Alignment of Large Language Models
Shashank Sonkar
Kangqi Ni
Sapana Chaudhary
Richard G. Baraniuk
AI4Ed
44
9
0
07 Feb 2024
Direct Language Model Alignment from Online AI Feedback
Shangmin Guo
Biao Zhang
Tianlin Liu
Tianqi Liu
Misha Khalman
...
Thomas Mesnard
Yao-Min Zhao
Bilal Piot
Johan Ferret
Mathieu Blondel
ALM
105
160
0
07 Feb 2024
MusicRL: Aligning Music Generation to Human Preferences
Geoffrey Cideron
Sertan Girgin
Mauro Verzetti
Damien Vincent
Matej Kastelic
...
Olivier Pietquin
Matthieu Geist
Léonard Hussenot
Neil Zeghidour
A. Agostinelli
87
22
0
06 Feb 2024
Harnessing the Plug-and-Play Controller by Prompting
Hao Wang
Lei Sha
66
4
0
06 Feb 2024
Personalized Language Modeling from Personalized Human Feedback
Xinyu Li
Zachary C. Lipton
Liu Leqi
ALM
132
59
0
06 Feb 2024
Psychological Assessments with Large Language Models: A Privacy-Focused and Cost-Effective Approach
Sergi Blanco-Cuaresma
55
1
0
05 Feb 2024
MobilityGPT: Enhanced Human Mobility Modeling with a GPT model
Ammar Haydari
Dongjie Chen
Zhengfeng Lai
Michael Zhang
Chen-Nee Chuah
145
10
0
05 Feb 2024
Best Practices for Text Annotation with Large Language Models
Petter Törnberg
103
22
0
05 Feb 2024
Preference-Conditioned Language-Guided Abstraction
Andi Peng
Andreea Bobu
Belinda Z. Li
T. Sumers
Ilia Sucholutsky
Nishanth Kumar
Thomas Griffiths
Julie A. Shah
75
13
0
05 Feb 2024
Decoding-time Realignment of Language Models
Tianlin Liu
Shangmin Guo
Leonardo Bianco
Daniele Calandriello
Quentin Berthet
Felipe Llinares-López
Jessica Hoffmann
Lucas Dixon
Michal Valko
Mathieu Blondel
AI4CE
119
46
0
05 Feb 2024
BRAIn: Bayesian Reward-conditioned Amortized Inference for natural language generation from feedback
Gaurav Pandey
Yatin Nandwani
Tahira Naseem
Mayank Mishra
Guangxuan Xu
Dinesh Raghu
Sachindra Joshi
Asim Munawar
Ramón Fernández Astudillo
BDL
66
4
0
04 Feb 2024
Aligner: Efficient Alignment by Learning to Correct
Jiaming Ji
Boyuan Chen
Hantao Lou
Chongye Guo
Borong Zhang
Xuehai Pan
Juntao Dai
Tianyi Qiu
Yaodong Yang
148
40
0
04 Feb 2024
Jailbreaking Attack against Multimodal Large Language Model
Zhenxing Niu
Haoxuan Ji
Xinbo Gao
Gang Hua
Rong Jin
97
76
0
04 Feb 2024
Rethinking the Role of Proxy Rewards in Language Model Alignment
Sungdong Kim
Minjoon Seo
SyDa
ALM
58
2
0
02 Feb 2024
KTO: Model Alignment as Prospect Theoretic Optimization
Kawin Ethayarajh
Winnie Xu
Niklas Muennighoff
Dan Jurafsky
Douwe Kiela
306
570
0
02 Feb 2024
Efficient Prompt Caching via Embedding Similarity
Hanlin Zhu
Banghua Zhu
Jiantao Jiao
RALM
86
9
0
02 Feb 2024
DTS-SQL: Decomposed Text-to-SQL with Small Large Language Models
Mohammadreza Pourreza
Davood Rafiei
75
30
0
02 Feb 2024
Towards Efficient Exact Optimization of Language Model Alignment
Haozhe Ji
Cheng Lu
Yilin Niu
Pei Ke
Hongning Wang
Jun Zhu
Jie Tang
Minlie Huang
97
20
0
01 Feb 2024
Transforming and Combining Rewards for Aligning Large Language Models
Zihao Wang
Chirag Nagpal
Jonathan Berant
Jacob Eisenstein
Alex DÁmour
Oluwasanmi Koyejo
Victor Veitch
89
16
0
01 Feb 2024
Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks
Andy Zhou
Bo Li
Haohan Wang
AAML
132
88
0
30 Jan 2024
Security and Privacy Challenges of Large Language Models: A Survey
B. Das
M. H. Amini
Yanzhao Wu
PILM
ELM
130
145
0
30 Jan 2024
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Banghua Zhu
Michael I. Jordan
Jiantao Jiao
84
33
0
29 Jan 2024
KAUCUS: Knowledge Augmented User Simulators for Training Language Model Assistants
Kaustubh D. Dhole
77
3
0
29 Jan 2024
Design Principles for Generative AI Applications
Justin D. Weisz
Jessica He
Michael J. Muller
Gabriela Hoefer
Rachel Miles
Werner Geyer
AI4CE
92
143
0
25 Jan 2024
DsDm: Model-Aware Dataset Selection with Datamodels
Logan Engstrom
Axel Feldmann
Aleksander Madry
OODD
110
61
0
23 Jan 2024
Previous
1
2
3
...
14
15
16
...
24
25
26
Next