Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1909.08593
Cited By
v1
v2 (latest)
Fine-Tuning Language Models from Human Preferences
18 September 2019
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Fine-Tuning Language Models from Human Preferences"
50 / 1,265 papers shown
Title
Aligning Crowd-sourced Human Feedback for Reinforcement Learning on Code Generation by Large Language Models
M. Wong
C. Tan
ALM
125
5
0
19 Mar 2025
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks
Yifei Zhou
Song Jiang
Yuandong Tian
Jason Weston
Sergey Levine
Sainbayar Sukhbaatar
Xian Li
LLMAG
LRM
151
15
0
19 Mar 2025
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs
Nicolas Le Roux
Marc G. Bellemare
Jonathan Lebensold
Arnaud Bergeron
Joshua Greaves
Alex Fréchette
Carolyne Pelletier
Eric Thibodeau-Laufer
Sándor Toth
Sam Work
OffRL
172
6
0
18 Mar 2025
MagicID: Hybrid Preference Optimization for ID-Consistent and Dynamic-Preserved Video Customization
Hengjia Li
Lifan Jiang
Xi Xiao
Tianyang Wang
Hongwei Yi
Boxi Wu
D. Cai
VGen
78
0
0
16 Mar 2025
Augmented Adversarial Trigger Learning
Zhe Wang
Yanjun Qi
92
0
0
16 Mar 2025
From Demonstrations to Rewards: Alignment Without Explicit Human Preferences
Siliang Zeng
Yao Liu
Huzefa Rangwala
George Karypis
Mingyi Hong
Rasool Fakoor
126
2
0
15 Mar 2025
Implicit Bias-Like Patterns in Reasoning Models
Messi H.J. Lee
Calvin K. Lai
LRM
118
0
0
14 Mar 2025
Residual Policy Gradient: A Reward View of KL-regularized Objective
Pengcheng Wang
Xinghao Zhu
Yuxin Chen
Chenfeng Xu
Masayoshi Tomizuka
Chenran Li
87
0
0
14 Mar 2025
RankPO: Preference Optimization for Job-Talent Matching
Yize Zhang
Ming Wang
Yu Wang
Xiaohui Wang
117
0
0
13 Mar 2025
Local Look-Ahead Guidance via Verifier-in-the-Loop for Automated Theorem Proving
Sara Rajaee
Kumar Pratik
Gabriele Cesa
Arash Behboodi
OffRL
LRM
119
0
0
12 Mar 2025
Aligning to What? Limits to RLHF Based Alignment
Logan Barnhart
Reza Akbarian Bafghi
Stephen Becker
M. Raissi
79
3
0
12 Mar 2025
Preference-Based Alignment of Discrete Diffusion Models
Umberto Borso
Davide Paglieri
Jude Wells
Tim Rocktaschel
108
3
0
11 Mar 2025
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training
Tong Wei
Yijun Yang
Junliang Xing
Yuanchun Shi
Zongqing Lu
Deheng Ye
OffRL
LRM
81
2
0
11 Mar 2025
Large Language Model as Meta-Surrogate for Data-Driven Many-Task Optimization: A Proof-of-Principle Study
Wei Wei
Yue-Jiao Gong
Jun Zhang
101
0
0
11 Mar 2025
The Impact of VR and 2D Interfaces on Human Feedback in Preference-Based Robot Learning
Jorge de Heuvel
Daniel Marta
Simon Holk
Iolanda Leite
Maren Bennewitz
101
1
0
11 Mar 2025
Beyond One-Size-Fits-All Summarization: Customizing Summaries for Diverse Users
Mehmet Samet Duran
Tevfik Aytekin
79
0
0
10 Mar 2025
Detection Avoidance Techniques for Large Language Models
Sinclair Schneider
Florian Steuber
João A. G. Schneider
Gabi Dreo Rodosek
DeLMO
113
0
0
10 Mar 2025
RePO: ReLU-based Preference Optimization
Junkang Wu
Kexin Huang
Xue Wang
Jinyang Gao
Bolin Ding
Jiancan Wu
Xiangnan He
Xiang Wang
108
1
0
10 Mar 2025
Boosting the Generalization and Reasoning of Vision Language Models with Curriculum Reinforcement Learning
Huilin Deng
Ding Zou
Rui Ma
Hongchen Luo
Yang Cao
Yu Kang
LRM
VLM
115
22
0
10 Mar 2025
ROCM: RLHF on consistency models
Shivanshu Shekhar
Tong Zhang
78
0
0
08 Mar 2025
Language Model Personalization via Reward Factorization
Idan Shenfeld
Felix Faltings
Pulkit Agrawal
Aldo Pacchiano
109
1
0
08 Mar 2025
Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning
Hyungkyu Kang
Min-hwan Oh
OffRL
119
0
0
07 Mar 2025
Shifting Perspectives: Steering Vector Ensembles for Robust Bias Mitigation in LLMs
Zara Siddique
Irtaza Khalid
Liam D. Turner
Luis Espinosa-Anke
LLMSV
161
2
0
07 Mar 2025
Soft Policy Optimization: Online Off-Policy RL for Sequence Models
Taco Cohen
David W. Zhang
Kunhao Zheng
Yunhao Tang
Rémi Munos
Gabriel Synnaeve
OffRL
115
1
0
07 Mar 2025
Adding Alignment Control to Language Models
Wenhong Zhu
Weinan Zhang
Rui Wang
124
0
0
06 Mar 2025
Towards Autonomous Reinforcement Learning for Real-World Robotic Manipulation with Large Language Models
Niccolò Turcato
Matteo Iovino
Aris Synodinos
Alberto Dalla Libera
R. Carli
Pietro Falco
LM&Ro
124
0
0
06 Mar 2025
DiffPO: Diffusion-styled Preference Optimization for Efficient Inference-Time Alignment of Large Language Models
Ruizhe Chen
Wenhao Chai
Zhifei Yang
Xiaotian Zhang
Qiufeng Wang
Tony Q.S. Quek
Soujanya Poria
Zuozhu Liu
133
1
0
06 Mar 2025
Visualising Policy-Reward Interplay to Inform Zeroth-Order Preference Optimisation of Large Language Models
Alessio Galatolo
Zhenbang Dai
Katie Winkle
Meriem Beloucif
87
0
0
05 Mar 2025
Effectively Steer LLM To Follow Preference via Building Confident Directions
Bingqing Song
Boran Han
Shuai Zhang
Hao Wang
Haoyang Fang
Bonan Min
Yuyang Wang
Mingyi Hong
LLMSV
93
4
0
04 Mar 2025
LoRA-Null: Low-Rank Adaptation via Null Space for Large Language Models
Pengwei Tang
Yebin Liu
Dongjie Zhang
Xing Wu
Debing Zhang
114
2
0
04 Mar 2025
What do Large Language Models Say About Animals? Investigating Risks of Animal Harm in Generated Text
Arturs Kanepajs
Aditi Basu
Sankalpa Ghose
Constance Li
Akshat Mehta
Ronak Mehta
Samuel David Tucker-Davis
Eric Zhou
Bob Fischer
Jacy Reese Anthis
ELM
ALM
149
1
0
03 Mar 2025
Alchemist: Towards the Design of Efficient Online Continual Learning System
Yuyang Huang
Yuhan Liu
Haryadi S. Gunawi
Beibin Li
Changho Hwang
CLL
OnRL
175
0
0
03 Mar 2025
Visual-RFT: Visual Reinforcement Fine-Tuning
Ziyu Liu
Zeyi Sun
Yuhang Zang
Xiaoyi Dong
Yuhang Cao
Haodong Duan
Dahua Lin
Jiaqi Wang
ObjD
VLM
LRM
157
129
0
03 Mar 2025
Robust Multi-Objective Preference Alignment with Online DPO
Raghav Gupta
Ryan Sullivan
Yunxuan Li
Samrat Phatale
Abhinav Rastogi
67
1
0
01 Mar 2025
Reducing Large Language Model Safety Risks in Women's Health using Semantic Entropy
Jahan C. Penny-Dimri
Magdalena Bachmann
William Cooke
Sam Mathewlynn
Samuel Dockree
John Tolladay
Jannik Kossen
Lin Li
Y. Gal
Gabriel Davis Jones
55
0
0
01 Mar 2025
Plan2Align: Predictive Planning Based Test-Time Preference Alignment for Large Language Models
Kuang-Da Wang
Teng-Ruei Chen
Yu-Heng Hung
Shuoyang Ding
Yueh-Hua Wu
Yu-Chun Wang
Chao-Han Huck Yang
Chao-Han Huck Yang
Wen-Chih Peng
Ping-Chun Hsieh
116
0
0
28 Feb 2025
Conformal Tail Risk Control for Large Language Model Alignment
Catherine Yu-Chi Chen
Jingyan Shen
Zhun Deng
Lihua Lei
112
1
0
27 Feb 2025
Robust Gymnasium: A Unified Modular Benchmark for Robust Reinforcement Learning
Shangding Gu
Laixi Shi
Muning Wen
Ming Jin
Eric Mazumdar
Yuejie Chi
Adam Wierman
C. Spanos
OOD
OffRL
90
2
0
27 Feb 2025
Societal Alignment Frameworks Can Improve LLM Alignment
Karolina Stañczak
Nicholas Meade
Mehar Bhatia
Hattie Zhou
Konstantin Böttinger
...
Timothy P. Lillicrap
Ana Marasović
Sylvie Delacroix
Gillian K. Hadfield
Siva Reddy
476
1
0
27 Feb 2025
General Intelligence Requires Reward-based Pretraining
Seungwook Han
Jyothish Pari
Samuel J. Gershman
Pulkit Agrawal
LRM
370
2
0
26 Feb 2025
Distill Not Only Data but Also Rewards: Can Smaller Language Models Surpass Larger Ones?
Yudi Zhang
Lu Wang
Meng Fang
Yali Du
Chenghua Huang
...
Qingwei Lin
Mykola Pechenizkiy
Dongmei Zhang
Saravan Rajmohan
Qi Zhang
ALM
136
2
0
26 Feb 2025
Controlled Diversity: Length-optimized Natural Language Generation
Diana Marie Schenke
Timo Baumann
71
0
0
26 Feb 2025
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning
Jiazhen Pan
Che Liu
Junde Wu
Fenglin Liu
Jiayuan Zhu
Hongwei Bran Li
Chen Chen
Cheng Ouyang
Daniel Rueckert
LRM
LM&MA
VLM
146
42
0
26 Feb 2025
MPO: An Efficient Post-Processing Framework for Mixing Diverse Preference Alignment
Tianze Wang
Dongnan Gui
Yifan Hu
Shuhang Lin
Linjun Zhang
91
1
0
25 Feb 2025
Stackelberg Game Preference Optimization for Data-Efficient Alignment of Language Models
Xu Chu
Zhixin Zhang
Tianyu Jia
Yujie Jin
141
0
0
25 Feb 2025
Larger or Smaller Reward Margins to Select Preferences for Alignment?
Kexin Huang
Junkang Wu
Ziqian Chen
Xue Wang
Jinyang Gao
Bolin Ding
Jiancan Wu
Xiangnan He
Xiang Wang
73
1
0
25 Feb 2025
Discriminative Finetuning of Generative Large Language Models without Reward Models and Human Preference Data
Siqi Guo
Ilgee Hong
Vicente Balmaseda
Changlong Yu
Liang Qiu
Xin Liu
Haoming Jiang
Tuo Zhao
Tianbao Yang
102
0
0
25 Feb 2025
AMPO: Active Multi-Preference Optimization for Self-play Preference Selection
Taneesh Gupta
Rahul Madhavan
Xuchao Zhang
Chetan Bansal
Saravan Rajmohan
111
0
0
25 Feb 2025
CuDIP: Enhancing Theorem Proving in LLMs via Curriculum Learning-based Direct Preference Optimization
Shuming Shi
Ruobing Zuo
Gaolei He
Jianlin Wang
Chenyang Xu
Zhengfeng Yang
115
0
0
25 Feb 2025
DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents
Taiyi Wang
Zhihao Wu
Jianheng Liu
Jianye Hao
Jun Wang
Kun Shao
OffRL
122
29
0
24 Feb 2025
Previous
1
2
3
4
5
...
24
25
26
Next