Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1909.08593
Cited By
v1
v2 (latest)
Fine-Tuning Language Models from Human Preferences
18 September 2019
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Fine-Tuning Language Models from Human Preferences"
50 / 1,265 papers shown
Title
Dataset Reset Policy Optimization for RLHF
Jonathan D. Chang
Wenhao Zhan
Owen Oertell
Kianté Brantley
Dipendra Kumar Misra
Jason D. Lee
Wen Sun
OffRL
117
24
0
12 Apr 2024
Efficient Duple Perturbation Robustness in Low-rank MDPs
Yang Hu
Haitong Ma
Bo Dai
Na Li
51
0
0
11 Apr 2024
GoEX: Perspectives and Designs Towards a Runtime for Autonomous LLM Applications
Shishir G. Patil
Tianjun Zhang
Vivian Fang
Noppapon C Roy Huang
Uc Berkeley
Aaron Hao
Martin Casado
Joseph E. Gonzalez Raluca
Ada Popa
Ion Stoica
ALM
81
13
0
10 Apr 2024
Rethinking How to Evaluate Language Model Jailbreak
Hongyu Cai
Arjun Arunasalam
Leo Y. Lin
Antonio Bianchi
Z. Berkay Celik
ALM
65
8
0
09 Apr 2024
Best-of-Venom: Attacking RLHF by Injecting Poisoned Preference Data
Tim Baumgärtner
Yang Gao
Dana Alon
Donald Metzler
AAML
95
23
0
08 Apr 2024
Plug and Play with Prompts: A Prompt Tuning Approach for Controlling Text Generation
R. Ajwani
Zining Zhu
Jonathan Rose
Frank Rudzicz
46
1
0
08 Apr 2024
Towards Understanding the Influence of Reward Margin on Preference Model Performance
Bowen Qin
Duanyu Feng
Xi Yang
54
4
0
07 Apr 2024
EnQuery: Ensemble Policies for Diverse Query-Generation in Preference Alignment of Robot Navigation
Jorge de Heuvel
Florian Seiler
Maren Bennewitz
73
2
0
07 Apr 2024
FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback
Liqiang Jing
Xinya Du
185
18
0
07 Apr 2024
The Case for Developing a Foundation Model for Planning-like Tasks from Scratch
Biplav Srivastava
Vishal Pallagani
LRM
66
2
0
06 Apr 2024
Aligning Diffusion Models by Optimizing Human Utility
Shufan Li
Konstantinos Kallidromitis
Akash Gokul
Yusuke Kato
Kazuki Kozuka
157
34
0
06 Apr 2024
Binary Classifier Optimization for Large Language Model Alignment
Seungjae Jung
Gunsoo Han
D. W. Nam
Kyoung-Woon On
82
25
0
06 Apr 2024
Exploring Autonomous Agents through the Lens of Large Language Models: A Review
Saikat Barua
LM&MA
LLMAG
84
20
0
05 Apr 2024
ROPO: Robust Preference Optimization for Large Language Models
Xize Liang
Chao Chen
Shuang Qiu
Jie Wang
Yue-bo Wu
Zhihang Fu
Zhihao Shi
Feng Wu
Jieping Ye
86
3
0
05 Apr 2024
Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data
Jingyu Zhang
Marc Marone
Tianjian Li
Benjamin Van Durme
Daniel Khashabi
193
9
0
05 Apr 2024
Investigating Regularization of Self-Play Language Models
Réda Alami
Abdalgader Abubaker
Mastane Achab
M. Seddik
Salem Lahlou
75
3
0
04 Apr 2024
Risks from Language Models for Automated Mental Healthcare: Ethics and Structure for Implementation
Declan Grabb
Max Lamparth
N. Vasan
89
17
0
02 Apr 2024
Towards Safety and Helpfulness Balanced Responses via Controllable Large Language Models
Yi-Lin Tuan
Xilun Chen
Eric Michael Smith
Louis Martin
Soumya Batra
Asli Celikyilmaz
William Yang Wang
Daniel M. Bikel
96
11
0
01 Apr 2024
Regularized Best-of-N Sampling with Minimum Bayes Risk Objective for Language Model Alignment
Yuu Jinnai
Tetsuro Morimura
Kaito Ariu
Kenshi Abe
135
8
0
01 Apr 2024
DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model
Lirui Zhao
Yue Yang
Kaipeng Zhang
Wenqi Shao
Yuxin Zhang
Yu Qiao
Ping Luo
Rongrong Ji
LM&Ro
LLMAG
VLM
70
3
0
31 Mar 2024
Dialectical Alignment: Resolving the Tension of 3H and Security Threats of LLMs
Shu Yang
Jiayuan Su
Han Jiang
Mengdi Li
Keyuan Cheng
Muhammad Asif Ali
Lijie Hu
Di Wang
100
6
0
30 Mar 2024
Interpreting Key Mechanisms of Factual Recall in Transformer-Based Language Models
Ang Lv
Yuhan Chen
Kaiyi Zhang
Yulong Wang
Lifeng Liu
Ji-Rong Wen
Jian Xie
Rui Yan
KELM
76
18
0
28 Mar 2024
Mixed Preference Optimization: Reinforcement Learning with Data Selection and Better Reference Model
Qi Gou
Cam-Tu Nguyen
127
14
0
28 Mar 2024
Fine-Tuning Language Models with Reward Learning on Policy
Hao Lang
Fei Huang
Yongbin Li
ALM
65
7
0
28 Mar 2024
sDPO: Don't Use Your Data All at Once
Dahyun Kim
Yungi Kim
Wonho Song
Hyeonwoo Kim
Yunsu Kim
Sanghoon Kim
Chanjun Park
76
35
0
28 Mar 2024
Disentangling Length from Quality in Direct Preference Optimization
Ryan Park
Rafael Rafailov
Stefano Ermon
Chelsea Finn
ALM
98
145
0
28 Mar 2024
FACTOID: FACtual enTailment fOr hallucInation Detection
Vipula Rawte
S. M. Towhidul
Krishnav Rajbangshi
Shravani Nag
Aman Chadha
Amit P. Sheth
Amitava Das
HILM
86
4
0
28 Mar 2024
CYCLE: Learning to Self-Refine the Code Generation
Yangruibo Ding
Marcus J. Min
Gail E. Kaiser
Baishakhi Ray
133
37
0
27 Mar 2024
Understanding the Learning Dynamics of Alignment with Human Feedback
Shawn Im
Yixuan Li
ALM
107
14
0
27 Mar 2024
BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text
Elliot Bolton
Abhinav Venigalla
Michihiro Yasunaga
David Leo Wright Hall
Betty Xiong
...
R. Daneshjou
Jonathan Frankle
Percy Liang
Michael Carbin
Christopher D. Manning
LM&MA
MedIm
101
64
0
27 Mar 2024
Improving Attributed Text Generation of Large Language Models via Preference Learning
Dongfang Li
Zetian Sun
Baotian Hu
Zhenyu Liu
Xinshuo Hu
Xuebo Liu
Min Zhang
90
15
0
27 Mar 2024
MetaAligner: Towards Generalizable Multi-Objective Alignment of Language Models
Kailai Yang
Zhiwei Liu
Qianqian Xie
Jimin Huang
Tianlin Zhang
Sophia Ananiadou
86
18
0
25 Mar 2024
Learning To Guide Human Decision Makers With Vision-Language Models
Debodeep Banerjee
Stefano Teso
Burcu Sayin
Andrea Passerini
108
1
0
25 Mar 2024
Large Language Models Offer an Alternative to the Traditional Approach of Topic Modelling
Yida Mu
Chun Dong
Kalina Bontcheva
Xingyi Song
75
25
0
24 Mar 2024
Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-Language Models
Minchan Kim
Minyeong Kim
Junik Bae
Suhwan Choi
Sungkyung Kim
Buru Chang
VLM
45
4
0
24 Mar 2024
The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization
Shengyi Huang
Michael Noukhovitch
Arian Hosseini
Kashif Rasul
Weixun Wang
Lewis Tunstall
VLM
112
38
0
24 Mar 2024
Risk and Response in Large Language Models: Evaluating Key Threat Categories
Bahareh Harandizadeh
A. Salinas
Fred Morstatter
98
4
0
22 Mar 2024
DreamReward: Text-to-3D Generation with Human Preference
Junliang Ye
Fangfu Liu
Qixiu Li
Zhengyi Wang
Yikai Wang
Xinzhou Wang
Yueqi Duan
Jun Zhu
107
29
0
21 Mar 2024
Locating and Mitigating Gender Bias in Large Language Models
Yuchen Cai
Ding Cao
Rongxi Guo
Yaqin Wen
Guiquan Liu
Enhong Chen
58
5
0
21 Mar 2024
Improving the Robustness of Large Language Models via Consistency Alignment
Zhao Yukun
Lingyong Yan
Weiwei Sun
Guoliang Xing
Shuaiqiang Wang
Meng Chong
Zhicong Cheng
Zhaochun Ren
Yin Dawei
88
22
0
21 Mar 2024
RewardBench: Evaluating Reward Models for Language Modeling
Nathan Lambert
Valentina Pyatkin
Jacob Morrison
Lester James V. Miranda
Bill Yuchen Lin
...
Sachin Kumar
Tom Zick
Yejin Choi
Noah A. Smith
Hanna Hajishirzi
ALM
195
260
0
20 Mar 2024
Securing Large Language Models: Threats, Vulnerabilities and Responsible Practices
Sara Abdali
Richard Anarfi
C. Barberan
Jia He
Erfan Shayegani
PILM
137
31
0
19 Mar 2024
Improving Dialogue Agents by Decomposing One Global Explicit Annotation with Local Implicit Multimodal Feedback
Dong Won Lee
Hae Won Park
Yoon Kim
C. Breazeal
Louis-Philippe Morency
111
0
0
17 Mar 2024
Reward Guided Latent Consistency Distillation
Jiachen Li
Weixi Feng
Wenhu Chen
William Y. Wang
EGVM
82
15
0
16 Mar 2024
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
Zhiqing Sun
Longhui Yu
Yikang Shen
Weiyang Liu
Yiming Yang
Sean Welleck
Chuang Gan
93
69
0
14 Mar 2024
Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization
Renjie Pi
Tianyang Han
Wei Xiong
Jipeng Zhang
Runtao Liu
Boyao Wang
Tong Zhang
MLLM
137
48
0
13 Mar 2024
SOTOPIA-
π
π
π
: Interactive Learning of Socially Intelligent Language Agents
Ruiyi Wang
Haofei Yu
W. Zhang
Zhengyang Qi
Maarten Sap
Graham Neubig
Yonatan Bisk
Hao Zhu
LLMAG
117
44
0
13 Mar 2024
Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards
Wei Shen
Xiaoying Zhang
Yuanshun Yao
Rui Zheng
Hongyi Guo
Yang Liu
ALM
83
14
0
12 Mar 2024
ORPO: Monolithic Preference Optimization without Reference Model
Jiwoo Hong
Noah Lee
James Thorne
OSLM
113
268
0
12 Mar 2024
(
N
,
K
)
\mathbf{(N,K)}
(
N
,
K
)
-Puzzle: A Cost-Efficient Testbed for Benchmarking Reinforcement Learning Algorithms in Generative Language Model
Yufeng Zhang
Liyu Chen
Boyi Liu
Yingxiang Yang
Qiwen Cui
Yunzhe Tao
Hongxia Yang
227
0
0
11 Mar 2024
Previous
1
2
3
...
12
13
14
...
24
25
26
Next