Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.18290
Cited By
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
29 May 2023
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Direct Preference Optimization: Your Language Model is Secretly a Reward Model"
50 / 2,637 papers shown
Title
Sell It Before You Make It: Revolutionizing E-Commerce with Personalized AI-Generated Items
Jianghao Lin
Peng Du
Jiaqi Liu
Wuyang Li
Yong Yu
Weinan Zhang
Yang Cao
DiffM
46
0
0
28 Mar 2025
Preference-based Learning with Retrieval Augmented Generation for Conversational Question Answering
Magdalena Kaiser
Gerhard Weikum
49
0
0
28 Mar 2025
Sharpe Ratio-Guided Active Learning for Preference Optimization in RLHF
Syrine Belakaria
Joshua Kazdan
Charles Marx
Chris Cundy
Willie Neiswanger
Sanmi Koyejo
Barbara Engelhardt
Stefano Ermon
41
0
0
28 Mar 2025
Learning to Reason for Long-Form Story Generation
Alexander Gurung
Mirella Lapata
ReLM
OffRL
LRM
68
1
0
28 Mar 2025
Exploring the Evolution of Physics Cognition in Video Generation: A Survey
Minghui Lin
Xiang Wang
Yansen Wang
Shu Wang
Fengqi Dai
...
Cunxiang Wang
Zhengrong Zuo
Nong Sang
Siteng Huang
Donglin Wang
EGVM
VGen
90
3
0
27 Mar 2025
M-DocSum: Do LVLMs Genuinely Comprehend Interleaved Image-Text in Document Summarization?
Haolong Yan
Kaijun Tan
Yeqing Shen
Xin Huang
Zheng Ge
Xiangyu Zhang
Si Li
Daxin Jiang
VLM
47
0
0
27 Mar 2025
Collab: Controlled Decoding using Mixture of Agents for LLM Alignment
Souradip Chakraborty
Sujay Bhatt
Udari Madhushani Sehwag
Soumya Suvra Ghosal
Jiahao Qiu
Mengdi Wang
Dinesh Manocha
Furong Huang
Alec Koppel
Sumitra Ganesh
56
2
0
27 Mar 2025
SWI: Speaking with Intent in Large Language Models
Yuwei Yin
EunJeong Hwang
Giuseppe Carenini
LRM
58
0
0
27 Mar 2025
ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models
Chung-En Sun
Ge Yan
Tsui-Wei Weng
KELM
LRM
67
2
0
27 Mar 2025
LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis
Jike Zhong
Qilong Wu
Xinyue Li
Bo Zhang
Ming Li
...
Haoyang Li
Yu Qiao
Peng Gao
Bin Fu
Zhen Li
EGVM
50
0
0
27 Mar 2025
Controlling Large Language Model with Latent Actions
Chengxing Jia
Ziniu Li
Pengyuan Wang
Yi-Chen Li
Zhenyu Hou
Yuxiao Dong
Y. Yu
61
0
0
27 Mar 2025
R-PRM: Reasoning-Driven Process Reward Modeling
Shuaijie She
Junxiao Liu
Yifeng Liu
Jiajun Chen
Xin Huang
Shujian Huang
LRM
49
2
0
27 Mar 2025
ZJUKLAB at SemEval-2025 Task 4: Unlearning via Model Merging
Haoming Xu
Shuxun Wang
Yanqiu Zhao
Yi Zhong
Ziyan Jiang
Ningyuan Zhao
Shumin Deng
Hongyu Chen
N. Zhang
MoMe
MU
72
0
0
27 Mar 2025
Local Normalization Distortion and the Thermodynamic Formalism of Decoding Strategies for Large Language Models
Tom Kempton
Stuart Burrell
42
0
0
27 Mar 2025
Boosting Large Language Models with Mask Fine-Tuning
M. Zhang
Yue Bai
Huan Wang
Yizhou Wang
Qihua Dong
Y. Fu
CLL
58
0
0
27 Mar 2025
Optimizing Case-Based Reasoning System for Functional Test Script Generation with Large Language Models
Siyuan Guo
Huiwu Liu
Xiaolong Chen
Yuming Xie
Liang Zhang
Tao Han
Hechang Chen
Yi Chang
Jun Wang
79
0
0
26 Mar 2025
Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs
Zitian Wang
Yue Liao
Kang Rong
Fengyun Rao
Yibo Yang
Si Liu
80
0
0
26 Mar 2025
Reasoning Beyond Limits: Advances and Open Problems for LLMs
M. Ferrag
Norbert Tihanyi
Merouane Debbah
ELM
OffRL
LRM
AI4CE
216
3
0
26 Mar 2025
Qwen2.5-Omni Technical Report
Jin Xu
Zhifang Guo
Jinzheng He
Hangrui Hu
Ting He
...
K. Dang
Bin Zhang
Xinyu Wang
Yunfei Chu
Junyang Lin
VGen
AuLLM
100
17
0
26 Mar 2025
Multi-head Reward Aggregation Guided by Entropy
Xiaomin Li
Xupeng Chen
Jingxuan Fan
Eric Hanchen Jiang
Mingye Gao
AAML
60
2
0
26 Mar 2025
Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning
Huajie Tan
Yuheng Ji
Xiaoshuai Hao
Minglan Lin
Pengwei Wang
Zhongyuan Wang
Shanghang Zhang
ReLM
OffRL
LRM
97
7
0
26 Mar 2025
GAPO: Learning Preferential Prompt through Generative Adversarial Policy Optimization
Zhouhong Gu
Xingzhou Chen
Xiaoran Shi
Tao Wang
Suhang Zheng
Tianyu Li
Hongwei Feng
Yanghua Xiao
80
0
0
26 Mar 2025
Mitigating Low-Level Visual Hallucinations Requires Self-Awareness: Database, Model and Training Strategy
Yinan Sun
Xiongkuo Min
Zicheng Zhang
Yixuan Gao
Yuhang Cao
Guangtao Zhai
VLM
64
0
0
26 Mar 2025
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation
Hongcheng Gao
Jiashu Qu
Jingyi Tang
Baolong Bi
Yi Liu
Hongyu Chen
Li Liang
Li Su
Qingming Huang
MLLM
VLM
LRM
88
5
0
25 Mar 2025
Direct Post-Training Preference Alignment for Multi-Agent Motion Generation Models Using Implicit Feedback from Pre-training Demonstrations
Ran Tian
Kratarth Goel
48
0
0
25 Mar 2025
One Framework to Rule Them All: Unifying RL-Based and RL-Free Methods in RLHF
Xin Cai
31
1
0
25 Mar 2025
Efficient Model Development through Fine-tuning Transfer
Pin-Jie Lin
Rishab Balasubramanian
Fengyuan Liu
Nikhil Kandpal
Tu Vu
73
1
0
25 Mar 2025
RL-finetuning LLMs from on- and off-policy data with a single algorithm
Yunhao Tang
Taco Cohen
David W. Zhang
Michal Valko
Rémi Munos
OffRL
46
3
0
25 Mar 2025
Generative Linguistics, Large Language Models, and the Social Nature of Scientific Success
Sophie Hao
ELM
AI4CE
56
0
0
25 Mar 2025
OAEI-LLM-T: A TBox Benchmark Dataset for Understanding Large Language Model Hallucinations in Ontology Matching
Zhangcheng Qiang
Kerry Taylor
Weiqing Wang
Jing Jiang
57
0
0
25 Mar 2025
Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing
Jaihoon Kim
Taehoon Yoon
Jisung Hwang
Minhyuk Sung
DiffM
56
1
0
25 Mar 2025
VectorFit : Adaptive Singular & Bias Vector Fine-Tuning of Pre-trained Foundation Models
Suhas G Hegde
S. K
Aruna Tiwari
59
0
0
25 Mar 2025
InPO: Inversion Preference Optimization with Reparametrized DDIM for Efficient Diffusion Model Alignment
Yaojie Lu
Qichao Wang
H. Cao
Xierui Wang
Xiaoyin Xu
Min Zhang
69
0
0
24 Mar 2025
Sun-Shine: A Large Language Model for Tibetan Culture
Cheng Huang
Fan Gao
Nyima Tashi
Yutong Liu
Xiangxiang Wang
...
Gadeng Luosang
Rinchen Dongrub
Dorje Tashi
Xiao Feng
Yongbin Yu
ALM
106
2
0
24 Mar 2025
Boosting Virtual Agent Learning and Reasoning: A Step-wise, Multi-dimensional, and Generalist Reward Model with Benchmark
Bingchen Miao
Y. Wu
Minghe Gao
Qifan Yu
Wendong Bu
Wenqiao Zhang
Yunfei Li
Siliang Tang
Tat-Seng Chua
Juncheng Billy Li
LLMAG
LRM
66
0
0
24 Mar 2025
Latent Embedding Adaptation for Human Preference Alignment in Diffusion Planners
Wen Zheng Terence Ng
Jianda Chen
Yuan Xu
Tianwei Zhang
41
0
0
24 Mar 2025
Teaching LLMs for Step-Level Automatic Math Correction via Reinforcement Learning
Jia-Nan Li
Jie Zhou
Yutao Yang
Bihao Zhan
Qianjun Pan
Yuyang Ding
Qin Chen
Jiang Bo
Xin Lin
Liang He
LRM
67
0
0
24 Mar 2025
A Survey of Large Language Model Agents for Question Answering
Murong Yue
LLMAG
LM&MA
ELM
64
3
0
24 Mar 2025
Won: Establishing Best Practices for Korean Financial NLP
Guijin Son
Hyunwoo Ko
Haneral Jung
Chami Hwang
51
0
0
23 Mar 2025
Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning
Yufei Zhan
Yousong Zhu
Shurong Zheng
Hongyin Zhao
Fan Yang
Ming Tang
Jinqiao Wang
VLM
67
5
0
23 Mar 2025
AGIR: Assessing 3D Gait Impairment with Reasoning based on LLMs
Diwei Wang
Cédric Bobenrieth
Hyewon Seo
LRM
47
0
0
23 Mar 2025
Debiasing Multimodal Large Language Models via Noise-Aware Preference Optimization
Zefeng Zhang
Hengzhu Tang
Shuaiyi Nie
Zhenyu Zhang
Yiming Ren
Zhenyang Li
Dawei Yin
Duohe Ma
Tingwen Liu
55
0
0
23 Mar 2025
Safe RLHF-V: Safe Reinforcement Learning from Multi-modal Human Feedback
Yalan Qin
Xiuying Chen
Rui Pan
Han Zhu
Chong Zhang
...
Chi-Min Chan
Sirui Han
Yike Guo
Yiran Yang
Yaodong Yang
OffRL
82
4
0
22 Mar 2025
A Survey on Mathematical Reasoning and Optimization with Large Language Models
Ali Forootani
OffRL
LRM
AI4CE
52
1
0
22 Mar 2025
Enhancing Persona Consistency for LLMs' Role-Playing using Persona-Aware Contrastive Learning
Ke Ji
Yixin Lian
Linxu Li
Jingsheng Gao
Weiyuan Li
Bin Dai
47
0
0
22 Mar 2025
Think Before Refusal : Triggering Safety Reflection in LLMs to Mitigate False Refusal Behavior
Shri Kiran Srinivasan
Xinpeng Wang
Guangyao Zhai
Nassir Navab
Yun Xue
LLMAG
61
0
0
22 Mar 2025
Every Sample Matters: Leveraging Mixture-of-Experts and High-Quality Data for Efficient and Accurate Code LLM
Codefuse
Ling Team
Wenting Cai
Yuchen Cao
C. Chen
...
Wei Zhang
Zhenru Zhang
Hailin Zhao
Xunjin Zheng
Jun Zhou
ALM
MoE
59
1
0
22 Mar 2025
Capturing Individual Human Preferences with Reward Features
André Barreto
Vincent Dumoulin
Yiran Mao
Nicolas Perez-Nieves
Bobak Shahriari
Yann Dauphin
Doina Precup
Hugo Larochelle
ALM
70
1
0
21 Mar 2025
Modifying Large Language Model Post-Training for Diverse Creative Writing
John Joon Young Chung
Vishakh Padmakumar
Melissa Roemmele
Yuqian Sun
Max Kreminski
MoMe
51
1
0
21 Mar 2025
When Preferences Diverge: Aligning Diffusion Models with Minority-Aware Adaptive DPO
Lefei Zhang
Chen Liu
C. Xu
Kai Hu
Donghao Luo
Chengjie Wang
Yanwei Fu
Yuan Yao
52
0
0
21 Mar 2025
Previous
1
2
3
...
8
9
10
...
51
52
53
Next