Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1909.08593
Cited By
v1
v2 (latest)
Fine-Tuning Language Models from Human Preferences
18 September 2019
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Fine-Tuning Language Models from Human Preferences"
50 / 1,265 papers shown
Title
Relic: Enhancing Reward Model Generalization for Low-Resource Indic Languages with Few-Shot Examples
Soumya Suvra Ghosal
Vaibhav Singh
Akash Ghosh
Soumyabrata Pal
Subhadip Baidya
Sriparna Saha
Dinesh Manocha
12
0
0
19 Jun 2025
Intelligent Assistants for the Semiconductor Failure Analysis with LLM-Based Planning Agents
Aline Dobrovsky
Konstantin Schekotihin
Christian Burmer
LLMAG
22
0
0
18 Jun 2025
Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability
Yusuke Sakai
Hidetaka Kamigaito
Taro Watanabe
LRM
29
0
0
18 Jun 2025
HeurAgenix: Leveraging LLMs for Solving Complex Combinatorial Optimization Challenges
Xianliang Yang
Ling Zhang
Haolong Qian
Lei Song
Jiang Bian
14
0
0
18 Jun 2025
TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization
Mingkang Zhu
Xi Chen
Zhongdao Wang
Bei Yu
Hengshuang Zhao
Jiaya Jia
15
0
0
17 Jun 2025
Dynamic Context-oriented Decomposition for Task-aware Low-rank Adaptation with Less Forgetting and Faster Convergence
Yibo Yang
Sihao Liu
Chuan Rao
Bang An
Tiancheng Shen
Philip Torr
Ming-Hsuan Yang
Bernard Ghanem
24
0
0
16 Jun 2025
From Outcomes to Processes: Guiding PRM Learning from ORM for Inference-Time Alignment
Bin Xie
Bingbing Xu
Yige Yuan
Shengmao Zhu
Huawei Shen
20
0
0
14 Jun 2025
Similarity as Reward Alignment: Robust and Versatile Preference-based Reinforcement Learning
Sara Rajaram
R. J. Cotton
Fabian H. Sinz
17
0
0
14 Jun 2025
Configurable Preference Tuning with Rubric-Guided Synthetic Data
Víctor Gallego
20
0
0
13 Jun 2025
EQA-RM: A Generative Embodied Reward Model with Test-time Scaling
Yuhang Chen
Zhen Tan
Tianlong Chen
105
0
0
12 Jun 2025
On a few pitfalls in KL divergence gradient estimation for RL
Yunhao Tang
Rémi Munos
62
0
0
11 Jun 2025
Efficient Preference-Based Reinforcement Learning: Randomized Exploration Meets Experimental Design
Andreas Schlaginhaufen
Reda Ouhamma
Maryam Kamgarpour
64
0
0
11 Jun 2025
EnerBridge-DPO: Energy-Guided Protein Inverse Folding with Markov Bridges and Direct Preference Optimization
Dingyi Rong
Haotian Lu
Wenzhuo Zheng
Fan Zhang
Shuangjia Zheng
Ning Liu
51
0
0
11 Jun 2025
Application-Driven Value Alignment in Agentic AI Systems: Survey and Perspectives
Wei Zeng
Hengshu Zhu
Chuan Qin
Han Wu
Yihang Cheng
...
Xiaowei Jin
Yinuo Shen
Zhenxing Wang
Feimin Zhong
Hui Xiong
AI4TS
65
0
0
11 Jun 2025
A Survey on Large Language Models for Mathematical Reasoning
Peng-Yuan Wang
Tian-Shuo Liu
Chenyang Wang
Yi-Di Wang
Shu Yan
...
Xu-Hui Liu
Xin-Wei Chen
Jia-Cheng Xu
Ziniu Li
Yang Yu
LRM
25
0
0
10 Jun 2025
Explicit Preference Optimization: No Need for an Implicit Reward Model
Xiangkun Hu
Lemin Kong
Tong He
David Wipf
23
0
0
09 Jun 2025
Beyond Jailbreaks: Revealing Stealthier and Broader LLM Security Risks Stemming from Alignment Failures
Yukai Zhou
Sibei Yang
Wenjie Wang
AAML
17
0
0
09 Jun 2025
History-Aware Cross-Attention Reinforcement: Self-Supervised Multi Turn and Chain-of-Thought Fine-Tuning with vLLM
Andrew Kiruluta
Andreas Lemos
Priscilla Burity
LRM
23
0
0
08 Jun 2025
Multi-Step Visual Reasoning with Visual Tokens Scaling and Verification
Tianyi Bai
Zengjie Hu
Fupeng Sun
Jiantao Qiu
Yizhen Jiang
Guangxin He
Bohan Zeng
Conghui He
Binhang Yuan
Wentao Zhang
OffRL
LRM
17
0
0
08 Jun 2025
Dual-Priv Pruning : Efficient Differential Private Fine-Tuning in Multimodal Large Language Models
Qianshan Wei
Jiaqi Li
Zihan You
Yi Zhan
Kecen Li
...
Yi Yu
Bin Cao
Yiwen Xu
Yang Liu
Guilin Qi
AAML
VLM
19
0
0
08 Jun 2025
Robotic Policy Learning via Human-assisted Action Preference Optimization
Wenke Xia
Yichu Yang
Hongtao Wu
Xiao Ma
Tao Kong
Di Hu
33
0
0
08 Jun 2025
AnnoDPO: Protein Functional Annotation Learning with Direct Preference Optimization
Zixuan Jiang
Renjing Xu
23
0
0
08 Jun 2025
Debiasing Online Preference Learning via Preference Feature Preservation
Dongyoung Kim
Jinsung Yoon
Jinwoo Shin
Jaehyung Kim
17
0
0
06 Jun 2025
Population-Proportional Preference Learning from Human Feedback: An Axiomatic Approach
Kihyun Kim
Jiawei Zhang
Asuman Ozdaglar
P. Parrilo
38
0
0
05 Jun 2025
SPARTA ALIGNMENT: Collectively Aligning Multiple Language Models through Combat
Yuru Jiang
Wenxuan Ding
Shangbin Feng
Greg Durrett
Yulia Tsvetkov
88
0
0
05 Jun 2025
MELABenchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource Maltese NLP
Kurt Micallef
Claudia Borg
27
0
0
04 Jun 2025
Misalignment or misuse? The AGI alignment tradeoff
Max Hellrigel-Holderbaum
Leonard Dung
73
0
0
04 Jun 2025
RewardAnything: Generalizable Principle-Following Reward Models
Zhuohao Yu
Jiali Zeng
Weizheng Gu
Yidong Wang
Jindong Wang
Fandong Meng
Jie Zhou
Yue Zhang
Shikun Zhang
Wei Ye
LRM
109
1
0
04 Jun 2025
RedDebate: Safer Responses through Multi-Agent Red Teaming Debates
Ali Asad
Stephen Obadinma
Radin Shayanfar
Xiaodan Zhu
AAML
LLMAG
17
0
0
04 Jun 2025
Leveraging Reward Models for Guiding Code Review Comment Generation
Oussama Ben Sghaier
Rosalia Tufano
Gabriele Bavota
Houari Sahraoui
21
0
0
04 Jun 2025
Selective Matching Losses -- Not All Scores Are Created Equal
Gil I. Shamir
Manfred K. Warmuth
27
0
0
04 Jun 2025
Robust Preference Optimization via Dynamic Target Margins
Jie Sun
Junkang Wu
Jiancan Wu
Zhibo Zhu
Xingyu Lu
Jun Zhou
Lintao Ma
Xiang Wang
53
0
0
04 Jun 2025
Aligning Large Language Models with Implicit Preferences from User-Generated Content
Zhaoxuan Tan
Zheng Li
Tianyi Liu
Haodong Wang
Hyokun Yun
...
Yifan Gao
Ruijie Wang
Priyanka Nigam
Bing Yin
Meng Jiang
75
0
0
04 Jun 2025
DPO Learning with LLMs-Judge Signal for Computer Use Agents
Man Luo
David Cobbley
Xin Su
Shachar Rosenman
Vasudev Lal
Shao-Yen Tseng
Phillip Howard
49
0
0
03 Jun 2025
Comprehensive Vulnerability Analysis is Necessary for Trustworthy LLM-MAS
Pengfei He
Yue Xing
Shen Dong
Juanhui Li
Zhenwei Dai
...
Hui Liu
Han Xu
Zhen Xiang
Charu C. Aggarwal
Hui Liu
LLMAG
84
0
0
02 Jun 2025
Towards Human-like Preference Profiling in Sequential Recommendation
Z. Ouyang
Qianlong Wen
Chunhui Zhang
Yanfang Ye
Soroush Vosoughi
HAI
25
0
0
02 Jun 2025
A Descriptive and Normative Theory of Human Beliefs in RLHF
Sylee Dandekar
Shripad Deshmukh
Frank Chiu
W. B. Knox
S. Niekum
55
0
0
02 Jun 2025
Gradient-Based Model Fingerprinting for LLM Similarity Detection and Family Classification
Zehao Wu
Yanjie Zhao
Haoyu Wang
69
0
0
02 Jun 2025
SuperRL: Reinforcement Learning with Supervision to Boost Language Model Reasoning
Yihao Liu
Shuocheng Li
Lang Cao
Yuhang Xie
Mengyu Zhou
Haoyu Dong
Xiaojun Ma
Shi Han
Dongmei Zhang
OffRL
ReLM
LRM
41
0
0
01 Jun 2025
Doubly Robust Alignment for Large Language Models
Erhan Xu
Kai Ye
Hongyi Zhou
Luhan Zhu
Francesco Quinzan
Chengchun Shi
44
0
0
01 Jun 2025
Adversarial Preference Learning for Robust LLM Alignment
Yuanfu Wang
Pengyu Wang
Chenyang Xi
Bo Tang
Junyi Zhu
...
Keming Mao
Zhiyu Li
Feiyu Xiong
Jie Hu
Mingchuan Yang
AAML
27
0
0
30 May 2025
A Reward-driven Automated Webshell Malicious-code Generator for Red-teaming
Yizhong Ding
AAML
20
0
0
30 May 2025
Aligning Language Models with Observational Data: Opportunities and Risks from a Causal Perspective
Erfan Loghmani
18
0
0
30 May 2025
MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning
Yiqing Liang
Jielin Qiu
Wenhao Ding
Zuxin Liu
James Tompkin
Mengdi Xu
Mengzhou Xia
Zhengzhong Tu
Laixi Shi
Jiacheng Zhu
OffRL
128
0
0
30 May 2025
MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning
Jingyan Shen
Jiarui Yao
Rui Yang
Yifan Sun
Feng Luo
Boyao Wang
Tong Zhang
Han Zhao
19
0
0
30 May 2025
Accelerating RLHF Training with Reward Variance Increase
Zonglin Yang
Zhexuan Gu
Houduo Qi
Yancheng Yuan
88
0
0
29 May 2025
Distortion of AI Alignment: Does Preference Optimization Optimize for Preferences?
Paul Gölz
Nika Haghtalab
Kunhe Yang
40
0
0
29 May 2025
Learning Parametric Distributions from Samples and Preferences
Marc Jourdan
Gizem Yüce
Nicolas Flammarion
25
0
0
29 May 2025
Beyond path selection: Better LLMs for Scientific Information Extraction with MimicSFT and Relevance and Rule-induced(R
2
^2
2
)GRPO
Ran Li
Shimin Di
Yuchen Liu
Chen Jing
Yu Qiu
Lei Chen
LRM
71
0
0
28 May 2025
Modeling and Optimizing User Preferences in AI Copilots: A Comprehensive Survey and Taxonomy
Saleh Afzoon
Zahra Jahanandish
Phuong Thao Huynh
Amin Beheshti
Usman Naseem
54
0
0
28 May 2025
1
2
3
4
...
24
25
26
Next