Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1909.08593
Cited By
v1
v2 (latest)
Fine-Tuning Language Models from Human Preferences
18 September 2019
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Fine-Tuning Language Models from Human Preferences"
50 / 1,265 papers shown
Title
Understanding the Logic of Direct Preference Alignment through Logic
Kyle Richardson
Vivek Srikumar
Ashish Sabharwal
224
2
0
23 Dec 2024
Lies, Damned Lies, and Distributional Language Statistics: Persuasion and Deception with Large Language Models
Cameron R. Jones
Benjamin Bergen
152
7
0
22 Dec 2024
Cannot or Should Not? Automatic Analysis of Refusal Composition in IFT/RLHF Datasets and Refusal Behavior of Black-Box LLMs
Alexander von Recum
Christoph Schnabl
Gabor Hollbeck
Silas Alberti
Philip Blinde
Marvin von Hagen
145
2
0
22 Dec 2024
LearnLM: Improving Gemini for Learning
LearnLM Team
Abhinit Modi
Aditya Srikanth Veerubhotla
Aliya Rysbek
Andrea Huber
...
Shaojian Zhu
Stephanie Chan
Steve Yadlowsky
Viknesh Sounderajah
Yannis Assael
145
8
0
21 Dec 2024
FedRLHF: A Convergence-Guaranteed Federated Framework for Privacy-Preserving and Personalized RLHF
Flint Xiaofeng Fan
Cheston Tan
Yew-Soon Ong
Roger Wattenhofer
Wei Tsang Ooi
172
1
0
20 Dec 2024
REFA: Reference Free Alignment for multi-preference optimization
Taneesh Gupta
Rahul Madhavan
Xuchao Zhang
Chetan Bansal
Saravan Rajmohan
182
1
0
20 Dec 2024
Learning to Generate Research Idea with Dynamic Control
Ruochen Li
Liqiang Jing
Chi Han
Jiawei Zhou
Xinya Du
LRM
117
6
0
19 Dec 2024
Energy-Based Preference Model Offers Better Offline Alignment than the Bradley-Terry Preference Model
Yuzhong Hong
Hanshan Zhang
Junwei Bao
Hongfei Jiang
Yang Song
OffRL
117
4
0
18 Dec 2024
Fool Me, Fool Me: User Attitudes Toward LLM Falsehoods
Diana Bar-Or Nirman
Ariel Weizman
Amos Azaria
HILM
113
1
0
16 Dec 2024
Why Does ChatGPT "Delve" So Much? Exploring the Sources of Lexical Overrepresentation in Large Language Models
Tom S. Juzek
Zina B. Ward
124
2
0
16 Dec 2024
UAlign: Leveraging Uncertainty Estimations for Factuality Alignment on Large Language Models
Boyang Xue
Fei Mi
Qi Zhu
Hongru Wang
Rui Wang
Sheng Wang
Erxin Yu
Xuming Hu
Kam-Fai Wong
HILM
218
2
0
16 Dec 2024
PickLLM: Context-Aware RL-Assisted Large Language Model Routing
Dimitrios Sikeridis
Dennis Ramdass
Pranay Pareek
152
3
0
12 Dec 2024
Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets
Zhen Liu
Tim Z. Xiao
Weiyang Liu
Yoshua Bengio
Dinghuai Zhang
252
6
0
10 Dec 2024
ROSE: A Reward-Oriented Data Selection Framework for LLM Task-Specific Instruction Tuning
Yang Wu
Huayi Zhang
Yizheng Jiao
Lin Ma
Xiaozhong Liu
Jinhong Yu
Dongyu Zhang
Dezhi Yu
Wei Xu
144
2
0
01 Dec 2024
o1-Coder: an o1 Replication for Coding
Yuxiang Zhang
Shangxi Wu
Yuqi Yang
Jiangming Shu
Jinlin Xiao
Chao Kong
Jitao Sang
LRM
169
51
0
29 Nov 2024
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?
Zhen Huang
Haoyang Zou
Xuefeng Li
Yixiu Liu
Yuxiang Zheng
Ethan Chern
Shijie Xia
Yiwei Qin
Weizhe Yuan
Pengfei Liu
VLM
128
52
0
25 Nov 2024
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages
Ashmal Vayani
Dinura Dissanayake
Hasindri Watawana
Noor Ahsan
Nevasini Sasikumar
...
Monojit Choudhury
Ivan Laptev
Mubarak Shah
Salman Khan
Fahad A Khan
256
16
0
25 Nov 2024
Reward Modeling with Ordinal Feedback: Wisdom of the Crowd
Shang Liu
Yu Pan
Guanting Chen
Xiaocheng Li
122
3
0
19 Nov 2024
Script-Strategy Aligned Generation: Aligning LLMs with Expert-Crafted Dialogue Scripts and Therapeutic Strategies for Psychotherapy
Xin Sun
Jan de Wit
Zhuying Li
Jiahuan Pei
Abdallah El Ali
Jos A. Bosch
110
2
0
11 Nov 2024
Contextualized Evaluations: Judging Language Model Responses to Underspecified Queries
Chaitanya Malaviya
Joseph Chee Chang
Dan Roth
Mohit Iyyer
Mark Yatskar
Kyle Lo
ELM
99
6
0
11 Nov 2024
Towards Improved Preference Optimization Pipeline: from Data Generation to Budget-Controlled Regularization
Zhuotong Chen
Fang Liu
Jennifer Zhu
Wanyu Du
Yanjun Qi
87
1
0
07 Nov 2024
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Heyang Zhao
Chenlu Ye
Quanquan Gu
Tong Zhang
OffRL
234
6
0
07 Nov 2024
TODO: Enhancing LLM Alignment with Ternary Preferences
Yuxiang Guo
Lu Yin
Bo Jiang
Jiaqi Zhang
125
3
0
02 Nov 2024
Matryoshka: Learning to Drive Black-Box LLMs with LLMs
Changhao Li
Yuchen Zhuang
Rushi Qiang
Haotian Sun
H. Dai
Chao Zhang
Bo Dai
LRM
48
6
0
28 Oct 2024
2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision
Shilong Li
Yancheng He
Hui Huang
Xingyuan Bu
Qingbin Liu
Hangyu Guo
Weixun Wang
Jihao Gu
Wenbo Su
Bo Zheng
98
7
0
25 Oct 2024
Enhancing Safety in Reinforcement Learning with Human Feedback via Rectified Policy Optimization
Xiyue Peng
Hengquan Guo
Jiawei Zhang
Dongqing Zou
Ziyu Shao
Honghao Wei
Xin Liu
132
3
0
25 Oct 2024
RSA-Control: A Pragmatics-Grounded Lightweight Controllable Text Generation Framework
Yifan Wang
Vera Demberg
72
1
0
24 Oct 2024
Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks
Graziano A. Manduzio
Federico A. Galatolo
M. G. Cimino
Enzo Pasquale Scilingo
Lorenzo Cominelli
LRM
38
1
0
24 Oct 2024
From Efficiency to Equity: Measuring Fairness in Preference Learning
Shreeyash Gowaikar
Hugo Berard
Rashid Mushkani
Shin Koseki
65
0
0
24 Oct 2024
From Imitation to Introspection: Probing Self-Consciousness in Language Models
Sirui Chen
Shu Yu
Shengjie Zhao
Chaochao Lu
MILM
LRM
154
4
0
24 Oct 2024
Improving Model Factuality with Fine-grained Critique-based Evaluator
Yiqing Xie
Wenxuan Zhou
Pradyot Prakash
Di Jin
Yuning Mao
...
Sinong Wang
Han Fang
Carolyn Rose
Daniel Fried
Hejia Zhang
HILM
167
8
0
24 Oct 2024
End-to-end Training for Recommendation with Language-based User Profiles
Zhaolin Gao
Joyce Zhou
Yijia Dai
Thorsten Joachims
AI4Ed
153
4
0
24 Oct 2024
Cross-lingual Transfer of Reward Models in Multilingual Alignment
Jiwoo Hong
Noah Lee
Rodrigo Martínez-Castaño
César Rodríguez
James Thorne
137
6
0
23 Oct 2024
PETAH: Parameter Efficient Task Adaptation for Hybrid Transformers in a resource-limited Context
Maximilian Augustin
Syed Shakib Sarwar
Mostafa Elhoushi
Sai Qian Zhang
Yuecheng Li
B. D. Salvo
66
1
0
23 Oct 2024
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Michael Noukhovitch
Shengyi Huang
Sophie Xhonneux
Arian Hosseini
Rishabh Agarwal
Rameswar Panda
OffRL
183
11
0
23 Oct 2024
Navigating Noisy Feedback: Enhancing Reinforcement Learning with Error-Prone Language Models
Muhan Lin
Shuyang Shi
Yue (Sophie) Guo
Behdad Chalaki
Vaishnav Tadiparthi
Ehsan Moradi-Pari
Simon Stepputtis
Joseph Campbell
Katia Sycara
66
2
0
22 Oct 2024
Exploring RL-based LLM Training for Formal Language Tasks with Programmed Rewards
Alexander Padula
Dennis J. N. J. Soemers
OffRL
94
0
0
22 Oct 2024
Optimal Design for Reward Modeling in RLHF
Antoine Scheid
Etienne Boursier
Alain Durmus
Michael I. Jordan
Pierre Ménard
Eric Moulines
Michal Valko
OffRL
148
9
0
22 Oct 2024
Science Out of Its Ivory Tower: Improving Accessibility with Reinforcement Learning
Haining Wang
Jason Clark
Hannah McKelvey
Leila Sterman
Zheng Gao
Zuoyu Tian
Sandra Kübler
Xiaozhong Liu
110
1
0
22 Oct 2024
Magnetic Preference Optimization: Achieving Last-iterate Convergence for Language Model Alignment
Mingzhi Wang
Chengdong Ma
Qizhi Chen
Linjian Meng
Yang Han
Jiancong Xiao
Zhaowei Zhang
Jing Huo
Weijie Su
Yaodong Yang
135
9
0
22 Oct 2024
ComPO: Community Preferences for Language Model Personalization
Sachin Kumar
Chan Young Park
Yulia Tsvetkov
Noah A. Smith
Hannaneh Hajishirzi
88
8
0
21 Oct 2024
On The Global Convergence Of Online RLHF With Neural Parametrization
Mudit Gaur
Amrit Singh Bedi
Raghu Pasupathy
Vaneet Aggarwal
77
1
0
21 Oct 2024
GUIDE: Real-Time Human-Shaped Agents
Lingyu Zhang
Zhengran Ji
Nicholas R Waytowich
Boyuan Chen
70
2
0
19 Oct 2024
GDPO: Learning to Directly Align Language Models with Diversity Using GFlowNets
Oh Joon Kwon
Daiki E. Matsunaga
Kee-Eung Kim
AI4CE
55
1
0
19 Oct 2024
Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens
Zhepeng Cen
Yao Liu
Siliang Zeng
Pratik Chaudhar
Huzefa Rangwala
George Karypis
Rasool Fakoor
SyDa
AIFin
133
3
0
18 Oct 2024
Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment
Chenhang Cui
An Zhang
Yiyang Zhou
Zhaorun Chen
Gelei Deng
Huaxiu Yao
Tat-Seng Chua
206
8
0
18 Oct 2024
γ
−
γ-
γ
−
MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
Yaxin Luo
Gen Luo
Jiayi Ji
Yiyi Zhou
Xiaoshuai Sun
Zhiqiang Shen
Rongrong Ji
VLM
MoE
95
1
0
17 Oct 2024
SPIN: Self-Supervised Prompt INjection
Leon Zhou
Junfeng Yang
Chengzhi Mao
AAML
SILM
82
1
0
17 Oct 2024
Fine-Tuning Discrete Diffusion Models via Reward Optimization with Applications to DNA and Protein Design
Chenyu Wang
Masatoshi Uehara
Yichun He
Amy Wang
Tommaso Biancalani
Avantika Lal
Tommi Jaakkola
Sergey Levine
Hanchen Wang
Aviv Regev
124
17
0
17 Oct 2024
Reverse-Engineering the Reader
Samuel Kiegeland
Ethan Gotlieb Wilcox
Afra Amini
David Robert Reich
Ryan Cotterell
64
0
0
16 Oct 2024
Previous
1
2
3
...
5
6
7
...
24
25
26
Next