ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.08593
  4. Cited By
Fine-Tuning Language Models from Human Preferences
v1v2 (latest)

Fine-Tuning Language Models from Human Preferences

18 September 2019
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
    ALM
ArXiv (abs)PDFHTML

Papers citing "Fine-Tuning Language Models from Human Preferences"

50 / 1,265 papers shown
Title
Multi-objective Large Language Model Alignment with Hierarchical Experts
Multi-objective Large Language Model Alignment with Hierarchical Experts
Zhuo Li
Guodong DU
Weiyang Guo
Yigeng Zhou
Xiucheng Li
...
Fangming Liu
Yequan Wang
Deheng Ye
Min Zhang
Jing Li
ALMMoE
80
0
0
27 May 2025
Square$χ$PO: Differentially Private and Robust $χ^2$-Preference Optimization in Offline Direct Alignment
SquareχχχPO: Differentially Private and Robust χ2χ^2χ2-Preference Optimization in Offline Direct Alignment
Xingyu Zhou
Yulian Wu
Wenqian Weng
Francesco Orabona
83
0
0
27 May 2025
Reinforcing General Reasoning without Verifiers
Reinforcing General Reasoning without Verifiers
Xiangxin Zhou
Zichen Liu
Anya Sims
Haonan Wang
Tianyu Pang
Chongxuan Li
Liang Wang
Min Lin
C. Du
OffRLLRM
78
2
0
27 May 2025
Learning to Select In-Context Demonstration Preferred by Large Language Model
Learning to Select In-Context Demonstration Preferred by Large Language Model
Zheng Zhang
Shaocheng Lan
Lei Song
Jiang Bian
Yexin Li
Kan Ren
27
0
0
26 May 2025
SafeDPO: A Simple Approach to Direct Preference Optimization with Enhanced Safety
SafeDPO: A Simple Approach to Direct Preference Optimization with Enhanced Safety
Geon-hyeong Kim
Youngsoo Jang
Yu Jin Kim
Byoungjip Kim
Honglak Lee
Kyunghoon Bae
Moontae Lee
26
2
0
26 May 2025
Token-level Accept or Reject: A Micro Alignment Approach for Large Language Models
Token-level Accept or Reject: A Micro Alignment Approach for Large Language Models
Y. Zhang
Yu Yu
Bo Tang
Yu Zhu
Chuxiong Sun
...
Jie Hu
Zipeng Xie
Zhiyu Li
Feiyu Xiong
Edward Chung
101
0
0
26 May 2025
Proxy-Free GFlowNet
Proxy-Free GFlowNet
Ruishuo Chen
Xun Wang
Rui Hu
Zhuoran Li
Longbo Huang
74
0
0
26 May 2025
Leveraging Importance Sampling to Detach Alignment Modules from Large Language Models
Leveraging Importance Sampling to Detach Alignment Modules from Large Language Models
Yi Liu
Dianqing Liu
Mingye Zhu
Junbo Guo
Yongdong Zhang
Zhendong Mao
102
0
0
26 May 2025
Accelerating Nash Learning from Human Feedback via Mirror Prox
Accelerating Nash Learning from Human Feedback via Mirror Prox
D. Tiapkin
Daniele Calandriello
Denis Belomestny
Eric Moulines
Alexey Naumov
Kashif Rasul
Michal Valko
Pierre Ménard
56
0
0
26 May 2025
Understanding the Performance Gap in Preference Learning: A Dichotomy of RLHF and DPO
Understanding the Performance Gap in Preference Learning: A Dichotomy of RLHF and DPO
Ruizhe Shi
Minhak Song
Runlong Zhou
Zihan Zhang
Maryam Fazel
S. S. Du
70
0
0
26 May 2025
Deep Actor-Critics with Tight Risk Certificates
Deep Actor-Critics with Tight Risk Certificates
Bahareh Tasdighi
Manuel Haussmann
Yi-Shan Wu
A. Masegosa
M. Kandemir
UQCV
88
0
0
26 May 2025
Surrogate Signals from Format and Length: Reinforcement Learning for Solving Mathematical Problems without Ground Truth Answers
Surrogate Signals from Format and Length: Reinforcement Learning for Solving Mathematical Problems without Ground Truth Answers
Rihui Xin
Han Liu
Zecheng Wang
Yupeng Zhang
Dianbo Sui
Xiaolin Hu
Bingning Wang
SyDa
67
1
0
26 May 2025
Learning to Reason without External Rewards
Learning to Reason without External Rewards
Xuandong Zhao
Zhewei Kang
Aosong Feng
Sergey Levine
Dawn Song
OffRLReLMLRM
127
8
0
26 May 2025
Incentivizing High-Quality Human Annotations with Golden Questions
Incentivizing High-Quality Human Annotations with Golden Questions
Shang Liu
Zhongze Cai
Hanzhao Wang
Zhongyao Ma
Xiaocheng Li
71
0
0
25 May 2025
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models
Fengqi Zhu
Rongzhen Wang
Shen Nie
Xiaolu Zhang
Chunwei Wu
...
Jun Zhou
Jianfei Chen
Yankai Lin
Ji-Rong Wen
Chongxuan Li
192
2
0
25 May 2025
Flex-Judge: Think Once, Judge Anywhere
Flex-Judge: Think Once, Judge Anywhere
Jongwoo Ko
S. Kim
Sungwoo Cho
Se-Young Yun
ELMLRM
218
0
0
24 May 2025
GRE Suite: Geo-localization Inference via Fine-Tuned Vision-Language Models and Enhanced Reasoning Chains
GRE Suite: Geo-localization Inference via Fine-Tuned Vision-Language Models and Enhanced Reasoning Chains
C. Wang
Xiaoran Pan
Zihao Pan
Haofan Wang
Yiren Song
LRM
138
0
0
24 May 2025
KL-regularization Itself is Differentially Private in Bandits and RLHF
KL-regularization Itself is Differentially Private in Bandits and RLHF
Yizhou Zhang
Kishan Panaganti
Laixi Shi
Juba Ziani
Adam Wierman
42
0
0
23 May 2025
Your Pre-trained LLM is Secretly an Unsupervised Confidence Calibrator
Your Pre-trained LLM is Secretly an Unsupervised Confidence Calibrator
Beier Luo
Shuoyuan Wang
Yixuan Li
Hongxin Wei
60
0
0
22 May 2025
MPO: Multilingual Safety Alignment via Reward Gap Optimization
MPO: Multilingual Safety Alignment via Reward Gap Optimization
Weixiang Zhao
Yulin Hu
Yang Deng
Tongtong Wu
Wenxuan Zhang
...
An Zhang
Yanyan Zhao
Bing Qin
Tat-Seng Chua
Ting Liu
100
2
0
22 May 2025
Learning to Rank Chain-of-Thought: An Energy-Based Approach with Outcome Supervision
Learning to Rank Chain-of-Thought: An Energy-Based Approach with Outcome Supervision
Eric Hanchen Jiang
Haozheng Luo
Shengyuan Pang
Xiaomin Li
Zhenting Qi
...
Zongyu Lin
Xinfeng Li
Hao Xu
Kai-Wei Chang
Ying Nian Wu
LRM
120
0
0
21 May 2025
A Unified Theoretical Analysis of Private and Robust Offline Alignment: from RLHF to DPO
A Unified Theoretical Analysis of Private and Robust Offline Alignment: from RLHF to DPO
Xingyu Zhou
Yulian Wu
Francesco Orabona
OffRL
102
1
0
21 May 2025
AAPO: Enhance the Reasoning Capabilities of LLMs with Advantage Momentum
AAPO: Enhance the Reasoning Capabilities of LLMs with Advantage Momentum
Jian Xiong
Jingbo Zhou
Jingyong Ye
Dejing Dou
LRM
93
0
0
20 May 2025
Self-Evolving Curriculum for LLM Reasoning
Self-Evolving Curriculum for LLM Reasoning
Xiaoyin Chen
Jiarui Lu
Minsu Kim
Dinghuai Zhang
Jian Tang
Alexandre Piché
Nicolas Angelard-Gontier
Yoshua Bengio
Ehsan Kamalloo
ReLMLRM
112
0
0
20 May 2025
DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization
DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization
Gang Li
Ming Lin
Tomer Galanti
Zhengzhong Tu
Tianbao Yang
102
1
0
18 May 2025
UFO-RL: Uncertainty-Focused Optimization for Efficient Reinforcement Learning Data Selection
UFO-RL: Uncertainty-Focused Optimization for Efficient Reinforcement Learning Data Selection
Yang Zhao
Kai Xiong
Xiao Ding
Li Du
YangouOuyang
...
Wentao Zhang
Bin Liu
Dong Hu
Bing Qin
Ting Liu
OffRL
83
0
0
18 May 2025
ExpertSteer: Intervening in LLMs through Expert Knowledge
ExpertSteer: Intervening in LLMs through Expert Knowledge
Weixuan Wang
Minghao Wu
Barry Haddow
Alexandra Birch
LLMSV
179
0
0
18 May 2025
OneTwoVLA: A Unified Vision-Language-Action Model with Adaptive Reasoning
OneTwoVLA: A Unified Vision-Language-Action Model with Adaptive Reasoning
Fanqi Lin
Ruiqian Nai
Yingdong Hu
Jiacheng You
Junming Zhao
Yang Gao
LRM
97
0
0
17 May 2025
Fair-PP: A Synthetic Dataset for Aligning LLM with Personalized Preferences of Social Equity
Fair-PP: A Synthetic Dataset for Aligning LLM with Personalized Preferences of Social Equity
Qi Zhou
Jie Zhang
Dongxia Wang
Qiang Liu
Tianlin Li
Jin Song Dong
Wenhai Wang
Qing Guo
SyDa
106
0
0
17 May 2025
SafeVid: Toward Safety Aligned Video Large Multimodal Models
SafeVid: Toward Safety Aligned Video Large Multimodal Models
Yixu Wang
Jiaxin Song
Yifeng Gao
Xin Wang
Yang Yao
Yan Teng
Xingjun Ma
Yingchun Wang
Yu-Gang Jiang
134
0
0
17 May 2025
Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Credit Assignment
Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Credit Assignment
Siliang Zeng
Quan Wei
William Brown
Oana Frunza
Yuriy Nevmyvaka
Mingyi Hong
LRM
108
2
0
17 May 2025
Spectral Policy Optimization: Coloring your Incorrect Reasoning in GRPO
Spectral Policy Optimization: Coloring your Incorrect Reasoning in GRPO
Peter Chen
Xiaopeng Li
Zhiyu Li
Xi Chen
Tianyi Lin
85
0
0
16 May 2025
Reinforcement Learning Finetunes Small Subnetworks in Large Language Models
Reinforcement Learning Finetunes Small Subnetworks in Large Language Models
Sagnik Mukherjee
Lifan Yuan
Dilek Hakkani-Tur
Hao Peng
110
0
0
16 May 2025
Can Global XAI Methods Reveal Injected Bias in LLMs? SHAP vs Rule Extraction vs RuleSHAP
Can Global XAI Methods Reveal Injected Bias in LLMs? SHAP vs Rule Extraction vs RuleSHAP
Francesco Sovrano
152
2
0
16 May 2025
A Systematic Analysis of Base Model Choice for Reward Modeling
A Systematic Analysis of Base Model Choice for Reward Modeling
Kian Ahrabian
Pegah Jandaghi
Negar Mokhberian
Sai Praneeth Karimireddy
Jay Pujara
134
0
0
16 May 2025
ADHMR: Aligning Diffusion-based Human Mesh Recovery via Direct Preference Optimization
ADHMR: Aligning Diffusion-based Human Mesh Recovery via Direct Preference Optimization
Wenhao Shen
Wanqi Yin
Xiaofeng Yang
Cheng Chen
Chaoyue Song
Zhongang Cai
Lei Yang
Hao Wang
Guosheng Lin
145
0
0
15 May 2025
Ethics and Persuasion in Reinforcement Learning from Human Feedback: A Procedural Rhetorical Approach
Ethics and Persuasion in Reinforcement Learning from Human Feedback: A Procedural Rhetorical Approach
Shannon Lodoen
Alexi Orchard
71
0
0
14 May 2025
RT-cache: Efficient Robot Trajectory Retrieval System
RT-cache: Efficient Robot Trajectory Retrieval System
Owen Kwon
Abraham George
Alison Bartsch
A. Farimani
57
1
0
14 May 2025
Improved Algorithms for Differentially Private Language Model Alignment
Improved Algorithms for Differentially Private Language Model Alignment
Keyu Chen
Hao Tang
Qinglin Liu
Yizhao Xu
52
0
0
13 May 2025
Fast Text-to-Audio Generation with Adversarial Post-Training
Fast Text-to-Audio Generation with Adversarial Post-Training
Cheng-i Wang
Zach Evans
Zack Zukowski
Josiah Taylor
CJ Carr
...
Adnan Al-Sinan
Gian Marco Iodice
Julian McAuley
Taylor Berg-Kirkpatrick
Jordi Pons
125
0
0
13 May 2025
You Only Look One Step: Accelerating Backpropagation in Diffusion Sampling with Gradient Shortcuts
You Only Look One Step: Accelerating Backpropagation in Diffusion Sampling with Gradient Shortcuts
Hongkun Dou
Zeyu Li
Xingyu Jiang
Haoyang Li
Lijun Yang
Wen Yao
Yue Deng
DiffM
234
0
0
12 May 2025
On the Robustness of Reward Models for Language Model Alignment
On the Robustness of Reward Models for Language Model Alignment
Jiwoo Hong
Noah Lee
Eunki Kim
Guijin Son
Woojin Chung
Aman Gupta
Shao Tang
James Thorne
99
0
0
12 May 2025
Direct Density Ratio Optimization: A Statistically Consistent Approach to Aligning Large Language Models
Direct Density Ratio Optimization: A Statistically Consistent Approach to Aligning Large Language Models
Rei Higuchi
Taiji Suzuki
122
1
0
12 May 2025
Evolutionary thoughts: integration of large language models and evolutionary algorithms
Evolutionary thoughts: integration of large language models and evolutionary algorithms
Antonio Jimeno Yepes
Pieter Barnard
69
0
0
09 May 2025
Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs
Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs
Chetan Pathade
AAMLSILM
223
2
0
07 May 2025
Policy-labeled Preference Learning: Is Preference Enough for RLHF?
Policy-labeled Preference Learning: Is Preference Enough for RLHF?
Taehyun Cho
Seokhun Ju
Seungyub Han
Dohyeong Kim
Kyungjae Lee
Jungwoo Lee
OffRL
114
0
0
06 May 2025
Geospatial Mechanistic Interpretability of Large Language Models
Geospatial Mechanistic Interpretability of Large Language Models
Stef De Sabbata
Stefano Mizzaro
Kevin Roitero
AI4CE
131
0
0
06 May 2025
FairPO: Robust Preference Optimization for Fair Multi-Label Learning
FairPO: Robust Preference Optimization for Fair Multi-Label Learning
Soumen Kumar Mondal
Akshit Varmora
Prateek Chanda
Ganesh Ramakrishnan
98
0
0
05 May 2025
Sailing by the Stars: A Survey on Reward Models and Learning Strategies for Learning from Rewards
Sailing by the Stars: A Survey on Reward Models and Learning Strategies for Learning from Rewards
Xiaobao Wu
LRM
225
5
0
05 May 2025
RM-R1: Reward Modeling as Reasoning
RM-R1: Reward Modeling as Reasoning
Xiusi Chen
Gaotang Li
Zehua Wang
Bowen Jin
Cheng Qian
...
Yu Zhang
D. Zhang
Tong Zhang
Hanghang Tong
Heng Ji
ReLMOffRLLRM
391
21
0
05 May 2025
Previous
12345...242526
Next