ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.08593
  4. Cited By
Fine-Tuning Language Models from Human Preferences
v1v2 (latest)

Fine-Tuning Language Models from Human Preferences

18 September 2019
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
    ALM
ArXiv (abs)PDFHTML

Papers citing "Fine-Tuning Language Models from Human Preferences"

50 / 1,265 papers shown
Title
Understanding the Logic of Direct Preference Alignment through Logic
Understanding the Logic of Direct Preference Alignment through Logic
Kyle Richardson
Vivek Srikumar
Ashish Sabharwal
224
2
0
23 Dec 2024
Lies, Damned Lies, and Distributional Language Statistics: Persuasion
  and Deception with Large Language Models
Lies, Damned Lies, and Distributional Language Statistics: Persuasion and Deception with Large Language Models
Cameron R. Jones
Benjamin Bergen
152
7
0
22 Dec 2024
Cannot or Should Not? Automatic Analysis of Refusal Composition in
  IFT/RLHF Datasets and Refusal Behavior of Black-Box LLMs
Cannot or Should Not? Automatic Analysis of Refusal Composition in IFT/RLHF Datasets and Refusal Behavior of Black-Box LLMs
Alexander von Recum
Christoph Schnabl
Gabor Hollbeck
Silas Alberti
Philip Blinde
Marvin von Hagen
145
2
0
22 Dec 2024
LearnLM: Improving Gemini for Learning
LearnLM: Improving Gemini for Learning
LearnLM Team
Abhinit Modi
Aditya Srikanth Veerubhotla
Aliya Rysbek
Andrea Huber
...
Shaojian Zhu
Stephanie Chan
Steve Yadlowsky
Viknesh Sounderajah
Yannis Assael
145
8
0
21 Dec 2024
FedRLHF: A Convergence-Guaranteed Federated Framework for Privacy-Preserving and Personalized RLHF
FedRLHF: A Convergence-Guaranteed Federated Framework for Privacy-Preserving and Personalized RLHF
Flint Xiaofeng Fan
Cheston Tan
Yew-Soon Ong
Roger Wattenhofer
Wei Tsang Ooi
172
1
0
20 Dec 2024
REFA: Reference Free Alignment for multi-preference optimization
REFA: Reference Free Alignment for multi-preference optimization
Taneesh Gupta
Rahul Madhavan
Xuchao Zhang
Chetan Bansal
Saravan Rajmohan
182
1
0
20 Dec 2024
Learning to Generate Research Idea with Dynamic Control
Learning to Generate Research Idea with Dynamic Control
Ruochen Li
Liqiang Jing
Chi Han
Jiawei Zhou
Xinya Du
LRM
117
6
0
19 Dec 2024
Energy-Based Preference Model Offers Better Offline Alignment than the
  Bradley-Terry Preference Model
Energy-Based Preference Model Offers Better Offline Alignment than the Bradley-Terry Preference Model
Yuzhong Hong
Hanshan Zhang
Junwei Bao
Hongfei Jiang
Yang Song
OffRL
117
4
0
18 Dec 2024
Fool Me, Fool Me: User Attitudes Toward LLM Falsehoods
Fool Me, Fool Me: User Attitudes Toward LLM Falsehoods
Diana Bar-Or Nirman
Ariel Weizman
Amos Azaria
HILM
113
1
0
16 Dec 2024
Why Does ChatGPT "Delve" So Much? Exploring the Sources of Lexical
  Overrepresentation in Large Language Models
Why Does ChatGPT "Delve" So Much? Exploring the Sources of Lexical Overrepresentation in Large Language Models
Tom S. Juzek
Zina B. Ward
124
2
0
16 Dec 2024
UAlign: Leveraging Uncertainty Estimations for Factuality Alignment on Large Language Models
UAlign: Leveraging Uncertainty Estimations for Factuality Alignment on Large Language Models
Boyang Xue
Fei Mi
Qi Zhu
Hongru Wang
Rui Wang
Sheng Wang
Erxin Yu
Xuming Hu
Kam-Fai Wong
HILM
218
2
0
16 Dec 2024
PickLLM: Context-Aware RL-Assisted Large Language Model Routing
PickLLM: Context-Aware RL-Assisted Large Language Model Routing
Dimitrios Sikeridis
Dennis Ramdass
Pranay Pareek
152
3
0
12 Dec 2024
Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets
Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets
Zhen Liu
Tim Z. Xiao
Weiyang Liu
Yoshua Bengio
Dinghuai Zhang
252
6
0
10 Dec 2024
ROSE: A Reward-Oriented Data Selection Framework for LLM Task-Specific
  Instruction Tuning
ROSE: A Reward-Oriented Data Selection Framework for LLM Task-Specific Instruction Tuning
Yang Wu
Huayi Zhang
Yizheng Jiao
Lin Ma
Xiaozhong Liu
Jinhong Yu
Dongyu Zhang
Dezhi Yu
Wei Xu
144
2
0
01 Dec 2024
o1-Coder: an o1 Replication for Coding
Yuxiang Zhang
Shangxi Wu
Yuqi Yang
Jiangming Shu
Jinlin Xiao
Chao Kong
Jitao Sang
LRM
169
51
0
29 Nov 2024
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple
  Distillation, Big Progress or Bitter Lesson?
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?
Zhen Huang
Haoyang Zou
Xuefeng Li
Yixiu Liu
Yuxiang Zheng
Ethan Chern
Shijie Xia
Yiwei Qin
Weizhe Yuan
Pengfei Liu
VLM
128
52
0
25 Nov 2024
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages
Ashmal Vayani
Dinura Dissanayake
Hasindri Watawana
Noor Ahsan
Nevasini Sasikumar
...
Monojit Choudhury
Ivan Laptev
Mubarak Shah
Salman Khan
Fahad A Khan
256
16
0
25 Nov 2024
Reward Modeling with Ordinal Feedback: Wisdom of the Crowd
Reward Modeling with Ordinal Feedback: Wisdom of the Crowd
Shang Liu
Yu Pan
Guanting Chen
Xiaocheng Li
122
3
0
19 Nov 2024
Script-Strategy Aligned Generation: Aligning LLMs with Expert-Crafted
  Dialogue Scripts and Therapeutic Strategies for Psychotherapy
Script-Strategy Aligned Generation: Aligning LLMs with Expert-Crafted Dialogue Scripts and Therapeutic Strategies for Psychotherapy
Xin Sun
Jan de Wit
Zhuying Li
Jiahuan Pei
Abdallah El Ali
Jos A. Bosch
110
2
0
11 Nov 2024
Contextualized Evaluations: Judging Language Model Responses to Underspecified Queries
Contextualized Evaluations: Judging Language Model Responses to Underspecified Queries
Chaitanya Malaviya
Joseph Chee Chang
Dan Roth
Mohit Iyyer
Mark Yatskar
Kyle Lo
ELM
99
6
0
11 Nov 2024
Towards Improved Preference Optimization Pipeline: from Data Generation
  to Budget-Controlled Regularization
Towards Improved Preference Optimization Pipeline: from Data Generation to Budget-Controlled Regularization
Zhuotong Chen
Fang Liu
Jennifer Zhu
Wanyu Du
Yanjun Qi
87
1
0
07 Nov 2024
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Heyang Zhao
Chenlu Ye
Quanquan Gu
Tong Zhang
OffRL
234
6
0
07 Nov 2024
TODO: Enhancing LLM Alignment with Ternary Preferences
TODO: Enhancing LLM Alignment with Ternary Preferences
Yuxiang Guo
Lu Yin
Bo Jiang
Jiaqi Zhang
125
3
0
02 Nov 2024
Matryoshka: Learning to Drive Black-Box LLMs with LLMs
Matryoshka: Learning to Drive Black-Box LLMs with LLMs
Changhao Li
Yuchen Zhuang
Rushi Qiang
Haotian Sun
H. Dai
Chao Zhang
Bo Dai
LRM
48
6
0
28 Oct 2024
2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional
  Supervision
2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision
Shilong Li
Yancheng He
Hui Huang
Xingyuan Bu
Qingbin Liu
Hangyu Guo
Weixun Wang
Jihao Gu
Wenbo Su
Bo Zheng
98
7
0
25 Oct 2024
Enhancing Safety in Reinforcement Learning with Human Feedback via Rectified Policy Optimization
Enhancing Safety in Reinforcement Learning with Human Feedback via Rectified Policy Optimization
Xiyue Peng
Hengquan Guo
Jiawei Zhang
Dongqing Zou
Ziyu Shao
Honghao Wei
Xin Liu
132
3
0
25 Oct 2024
RSA-Control: A Pragmatics-Grounded Lightweight Controllable Text
  Generation Framework
RSA-Control: A Pragmatics-Grounded Lightweight Controllable Text Generation Framework
Yifan Wang
Vera Demberg
72
1
0
24 Oct 2024
Improving Small-Scale Large Language Models Function Calling for
  Reasoning Tasks
Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks
Graziano A. Manduzio
Federico A. Galatolo
M. G. Cimino
Enzo Pasquale Scilingo
Lorenzo Cominelli
LRM
38
1
0
24 Oct 2024
From Efficiency to Equity: Measuring Fairness in Preference Learning
From Efficiency to Equity: Measuring Fairness in Preference Learning
Shreeyash Gowaikar
Hugo Berard
Rashid Mushkani
Shin Koseki
65
0
0
24 Oct 2024
From Imitation to Introspection: Probing Self-Consciousness in Language
  Models
From Imitation to Introspection: Probing Self-Consciousness in Language Models
Sirui Chen
Shu Yu
Shengjie Zhao
Chaochao Lu
MILMLRM
154
4
0
24 Oct 2024
Improving Model Factuality with Fine-grained Critique-based Evaluator
Improving Model Factuality with Fine-grained Critique-based Evaluator
Yiqing Xie
Wenxuan Zhou
Pradyot Prakash
Di Jin
Yuning Mao
...
Sinong Wang
Han Fang
Carolyn Rose
Daniel Fried
Hejia Zhang
HILM
167
8
0
24 Oct 2024
End-to-end Training for Recommendation with Language-based User Profiles
End-to-end Training for Recommendation with Language-based User Profiles
Zhaolin Gao
Joyce Zhou
Yijia Dai
Thorsten Joachims
AI4Ed
153
4
0
24 Oct 2024
Cross-lingual Transfer of Reward Models in Multilingual Alignment
Cross-lingual Transfer of Reward Models in Multilingual Alignment
Jiwoo Hong
Noah Lee
Rodrigo Martínez-Castaño
César Rodríguez
James Thorne
137
6
0
23 Oct 2024
PETAH: Parameter Efficient Task Adaptation for Hybrid Transformers in a
  resource-limited Context
PETAH: Parameter Efficient Task Adaptation for Hybrid Transformers in a resource-limited Context
Maximilian Augustin
Syed Shakib Sarwar
Mostafa Elhoushi
Sai Qian Zhang
Yuecheng Li
B. D. Salvo
66
1
0
23 Oct 2024
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Michael Noukhovitch
Shengyi Huang
Sophie Xhonneux
Arian Hosseini
Rishabh Agarwal
Rameswar Panda
OffRL
183
11
0
23 Oct 2024
Navigating Noisy Feedback: Enhancing Reinforcement Learning with
  Error-Prone Language Models
Navigating Noisy Feedback: Enhancing Reinforcement Learning with Error-Prone Language Models
Muhan Lin
Shuyang Shi
Yue (Sophie) Guo
Behdad Chalaki
Vaishnav Tadiparthi
Ehsan Moradi-Pari
Simon Stepputtis
Joseph Campbell
Katia Sycara
66
2
0
22 Oct 2024
Exploring RL-based LLM Training for Formal Language Tasks with
  Programmed Rewards
Exploring RL-based LLM Training for Formal Language Tasks with Programmed Rewards
Alexander Padula
Dennis J. N. J. Soemers
OffRL
94
0
0
22 Oct 2024
Optimal Design for Reward Modeling in RLHF
Optimal Design for Reward Modeling in RLHF
Antoine Scheid
Etienne Boursier
Alain Durmus
Michael I. Jordan
Pierre Ménard
Eric Moulines
Michal Valko
OffRL
148
9
0
22 Oct 2024
Science Out of Its Ivory Tower: Improving Accessibility with Reinforcement Learning
Science Out of Its Ivory Tower: Improving Accessibility with Reinforcement Learning
Haining Wang
Jason Clark
Hannah McKelvey
Leila Sterman
Zheng Gao
Zuoyu Tian
Sandra Kübler
Xiaozhong Liu
110
1
0
22 Oct 2024
Magnetic Preference Optimization: Achieving Last-iterate Convergence for Language Model Alignment
Magnetic Preference Optimization: Achieving Last-iterate Convergence for Language Model Alignment
Mingzhi Wang
Chengdong Ma
Qizhi Chen
Linjian Meng
Yang Han
Jiancong Xiao
Zhaowei Zhang
Jing Huo
Weijie Su
Yaodong Yang
135
9
0
22 Oct 2024
ComPO: Community Preferences for Language Model Personalization
ComPO: Community Preferences for Language Model Personalization
Sachin Kumar
Chan Young Park
Yulia Tsvetkov
Noah A. Smith
Hannaneh Hajishirzi
88
8
0
21 Oct 2024
On The Global Convergence Of Online RLHF With Neural Parametrization
On The Global Convergence Of Online RLHF With Neural Parametrization
Mudit Gaur
Amrit Singh Bedi
Raghu Pasupathy
Vaneet Aggarwal
77
1
0
21 Oct 2024
GUIDE: Real-Time Human-Shaped Agents
GUIDE: Real-Time Human-Shaped Agents
Lingyu Zhang
Zhengran Ji
Nicholas R Waytowich
Boyuan Chen
70
2
0
19 Oct 2024
GDPO: Learning to Directly Align Language Models with Diversity Using
  GFlowNets
GDPO: Learning to Directly Align Language Models with Diversity Using GFlowNets
Oh Joon Kwon
Daiki E. Matsunaga
Kee-Eung Kim
AI4CE
55
1
0
19 Oct 2024
Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens
Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens
Zhepeng Cen
Yao Liu
Siliang Zeng
Pratik Chaudhar
Huzefa Rangwala
George Karypis
Rasool Fakoor
SyDaAIFin
133
3
0
18 Oct 2024
Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment
Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment
Chenhang Cui
An Zhang
Yiyang Zhou
Zhaorun Chen
Gelei Deng
Huaxiu Yao
Tat-Seng Chua
206
8
0
18 Oct 2024
$γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large
  Language Models
γ−γ-γ−MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
Yaxin Luo
Gen Luo
Jiayi Ji
Yiyi Zhou
Xiaoshuai Sun
Zhiqiang Shen
Rongrong Ji
VLMMoE
95
1
0
17 Oct 2024
SPIN: Self-Supervised Prompt INjection
SPIN: Self-Supervised Prompt INjection
Leon Zhou
Junfeng Yang
Chengzhi Mao
AAMLSILM
82
1
0
17 Oct 2024
Fine-Tuning Discrete Diffusion Models via Reward Optimization with Applications to DNA and Protein Design
Fine-Tuning Discrete Diffusion Models via Reward Optimization with Applications to DNA and Protein Design
Chenyu Wang
Masatoshi Uehara
Yichun He
Amy Wang
Tommaso Biancalani
Avantika Lal
Tommi Jaakkola
Sergey Levine
Hanchen Wang
Aviv Regev
124
17
0
17 Oct 2024
Reverse-Engineering the Reader
Reverse-Engineering the Reader
Samuel Kiegeland
Ethan Gotlieb Wilcox
Afra Amini
David Robert Reich
Ryan Cotterell
64
0
0
16 Oct 2024
Previous
123...567...242526
Next