ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.18290
  4. Cited By
Direct Preference Optimization: Your Language Model is Secretly a Reward
  Model

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

29 May 2023
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
    ALM
ArXivPDFHTML

Papers citing "Direct Preference Optimization: Your Language Model is Secretly a Reward Model"

50 / 2,637 papers shown
Title
Cancer Type, Stage and Prognosis Assessment from Pathology Reports using LLMs
Rachit Saluja
Jacob Rosenthal
Yoav Artzi
David J. Pisapia
B. Liechty
M. Sabuncu
LM&MA
ELM
70
1
0
03 Mar 2025
CE-U: Cross Entropy Unlearning
CE-U: Cross Entropy Unlearning
Bo Yang
MU
58
0
0
03 Mar 2025
Building Safe GenAI Applications: An End-to-End Overview of Red Teaming for Large Language Models
Alberto Purpura
Sahil Wadhwa
Jesse Zymet
Akshay Gupta
Andy Luo
Melissa Kazemi Rad
Swapnil Shinde
Mohammad Sorower
AAML
278
0
0
03 Mar 2025
All Roads Lead to Likelihood: The Value of Reinforcement Learning in Fine-Tuning
Gokul Swamy
Sanjiban Choudhury
Wen Sun
Zhiwei Steven Wu
J. Andrew Bagnell
OffRL
52
8
0
03 Mar 2025
PABBO: Preferential Amortized Black-Box Optimization
Xinyu Zhang
Daolang Huang
Samuel Kaski
Julien Martinelli
36
0
0
02 Mar 2025
Behavior Preference Regression for Offline Reinforcement Learning
Padmanaba Srinivasan
William J. Knottenbelt
OffRL
38
0
0
02 Mar 2025
Rewarding Graph Reasoning Process makes LLMs more Generalized Reasoners
Miao Peng
Nuo Chen
Zongrui Suo
Jia Li
LRM
43
1
0
02 Mar 2025
SFO: Piloting VLM Feedback for Offline RL
SFO: Piloting VLM Feedback for Offline RL
Jacob Beck
OffRL
44
0
0
02 Mar 2025
A Simple and Effective Reinforcement Learning Method for Text-to-Image Diffusion Fine-tuning
Shashank Gupta
Chaitanya Ahuja
Tsung-Yu Lin
Sreya Dutta Roy
Harrie Oosterhuis
Maarten de Rijke
Satya Narayan Shukla
59
1
0
02 Mar 2025
Personalize Your LLM: Fake it then Align it
Yijing Zhang
Dyah Adila
Changho Shin
Frederic Sala
91
0
0
02 Mar 2025
Distributionally Robust Reinforcement Learning with Human Feedback
Debmalya Mandal
Paulius Sasnauskas
Goran Radanović
44
1
0
01 Mar 2025
AesthetiQ: Enhancing Graphic Layout Design via Aesthetic-Aware Preference Alignment of Multi-modal Large Language Models
Sohan Patnaik
Rishabh Jain
Balaji Krishnamurthy
Mausoom Sarkar
41
0
0
01 Mar 2025
Robust Multi-Objective Preference Alignment with Online DPO
Raghav Gupta
Ryan Sullivan
Yunxuan Li
Samrat Phatale
Abhinav Rastogi
50
0
0
01 Mar 2025
Sentence-level Reward Model can Generalize Better for Aligning LLM from Human Preference
Wenjie Qiu
Yi-Chen Li
Xuqin Zhang
Tianyi Zhang
Yiming Zhang
Zongzhang Zhang
Yang Yu
ALM
56
0
0
01 Mar 2025
Safety Tax: Safety Alignment Makes Your Large Reasoning Models Less Reasonable
Tiansheng Huang
Sihao Hu
Fatih Ilhan
Selim Furkan Tekin
Zachary Yahn
Yichang Xu
Ling Liu
66
13
0
01 Mar 2025
Octopus: Alleviating Hallucination via Dynamic Contrastive Decoding
Wei Suo
Lijun Zhang
Mengyang Sun
Lin Yuanbo Wu
Peng Wang
Yuyao Zhang
MLLM
VLM
52
1
0
01 Mar 2025
Beware of Your Po! Measuring and Mitigating AI Safety Risks in Role-Play Fine-Tuning of LLMs
Beware of Your Po! Measuring and Mitigating AI Safety Risks in Role-Play Fine-Tuning of LLMs
Weixiang Zhao
Yulin Hu
Yang Deng
Jiahe Guo
Xingyu Sui
...
An Zhang
Yanyan Zhao
Bing Qin
Tat-Seng Chua
Ting Liu
71
2
0
28 Feb 2025
The Rise of Darkness: Safety-Utility Trade-Offs in Role-Playing Dialogue Agents
The Rise of Darkness: Safety-Utility Trade-Offs in Role-Playing Dialogue Agents
Yihong Tang
Kehai Chen
X. Bai
Zhengyu Niu
Binghui Wang
Jie Liu
Min Zhang
LLMAG
58
0
0
28 Feb 2025
SuperRAG: Beyond RAG with Layout-Aware Graph Modeling
Jeff Yang
Duy-Khanh Vu
Minh-Tien Nguyen
Xuan-Quang Nguyen
Linh Nguyen
H. Le
3DV
73
0
0
28 Feb 2025
Societal Alignment Frameworks Can Improve LLM Alignment
Karolina Stañczak
Nicholas Meade
Mehar Bhatia
Hattie Zhou
Konstantin Böttinger
...
Timothy P. Lillicrap
Ana Marasović
Sylvie Delacroix
Gillian K. Hadfield
Siva Reddy
254
0
0
27 Feb 2025
Preference Learning Unlocks LLMs' Psycho-Counseling Skills
Preference Learning Unlocks LLMs' Psycho-Counseling Skills
Mian Zhang
S. Eack
Zhiyu Zoey Chen
84
1
0
27 Feb 2025
SoRFT: Issue Resolving with Subtask-oriented Reinforced Fine-Tuning
SoRFT: Issue Resolving with Subtask-oriented Reinforced Fine-Tuning
Zexiong Ma
Chao Peng
Pengfei Gao
Xiangxin Meng
Yanzhen Zou
Bing Xie
MoMe
OffRL
LRM
73
4
0
27 Feb 2025
LIVS: A Pluralistic Alignment Dataset for Inclusive Public Spaces
LIVS: A Pluralistic Alignment Dataset for Inclusive Public Spaces
Rashid Mushkani
Shravan Nayak
Hugo Berard
Allison Cohen
Shin Koseki
Hadrien Bertrand
54
2
0
27 Feb 2025
Bridging the Creativity Understanding Gap: Small-Scale Human Alignment Enables Expert-Level Humor Ranking in LLMs
Bridging the Creativity Understanding Gap: Small-Scale Human Alignment Enables Expert-Level Humor Ranking in LLMs
Kuan Lok Zhou
Jiayi Chen
Siddharth Suresh
Reuben Narad
Timothy T. Rogers
Lalit K Jain
R. Nowak
Bob Mankoff
Jifan Zhang
57
1
0
27 Feb 2025
Erasing Without Remembering: Implicit Knowledge Forgetting in Large Language Models
Erasing Without Remembering: Implicit Knowledge Forgetting in Large Language Models
Huazheng Wang
Yongcheng Jing
Haifeng Sun
Yingjie Wang
Jingchao Wang
Jianxin Liao
Dacheng Tao
KELM
MU
57
0
0
27 Feb 2025
Foot-In-The-Door: A Multi-turn Jailbreak for LLMs
Foot-In-The-Door: A Multi-turn Jailbreak for LLMs
Zixuan Weng
Xiaolong Jin
Jinyuan Jia
Xinsong Zhang
AAML
207
0
0
27 Feb 2025
Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective
Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective
Jiawei Huang
Bingcong Li
Christoph Dann
Niao He
OffRL
85
0
0
26 Feb 2025
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
Hao Peng
Yunjia Qi
Xiaozhi Wang
Zijun Yao
Bin Xu
Lei Hou
Juanzi Li
ALM
LRM
64
4
0
26 Feb 2025
ANPMI: Assessing the True Comprehension Capabilities of LLMs for Multiple Choice Questions
ANPMI: Assessing the True Comprehension Capabilities of LLMs for Multiple Choice Questions
Gyeongje Cho
Yeonkyoung So
Jaejin Lee
ELM
64
0
0
26 Feb 2025
FSPO: Few-Shot Preference Optimization of Synthetic Preference Data in LLMs Elicits Effective Personalization to Real Users
FSPO: Few-Shot Preference Optimization of Synthetic Preference Data in LLMs Elicits Effective Personalization to Real Users
Anikait Singh
Sheryl Hsu
Kyle Hsu
E. Mitchell
Stefano Ermon
Tatsunori Hashimoto
Archit Sharma
Chelsea Finn
SyDa
OffRL
63
1
0
26 Feb 2025
Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs
Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs
Zhaowei Zhang
Fengshuo Bai
Qizhi Chen
Chengdong Ma
Mingzhi Wang
Haoran Sun
Zilong Zheng
Yaodong Yang
78
3
0
26 Feb 2025
Learning to Align Multi-Faceted Evaluation: A Unified and Robust Framework
Learning to Align Multi-Faceted Evaluation: A Unified and Robust Framework
Kaishuai Xu
Tiezheng YU
Wenjun Hou
Yi Cheng
Liangyou Li
Xin Jiang
Lifeng Shang
Qiang Liu
Wenjie Li
ELM
71
0
0
26 Feb 2025
Two Heads Are Better Than One: Dual-Model Verbal Reflection at Inference-Time
Two Heads Are Better Than One: Dual-Model Verbal Reflection at Inference-Time
Jiazheng Li
Yuxiang Zhou
Junru Lu
Gladys Tyen
Lin Gui
Cesare Aloisi
Yulan He
LRM
44
2
0
26 Feb 2025
Reward Shaping to Mitigate Reward Hacking in RLHF
Reward Shaping to Mitigate Reward Hacking in RLHF
Jiayi Fu
Xuandong Zhao
Chengyuan Yao
Han Wang
Qi Han
Yanghua Xiao
88
7
0
26 Feb 2025
Kanana: Compute-efficient Bilingual Language Models
Kanana: Compute-efficient Bilingual Language Models
Kanana LLM Team
Yunju Bak
Hojin Lee
Minho Ryu
Jiyeon Ham
...
Daniel Lee
Minchul Lee
MinHyung Lee
Shinbok Lee
Gaeun Seo
98
1
0
26 Feb 2025
Judge as A Judge: Improving the Evaluation of Retrieval-Augmented Generation through the Judge-Consistency of Large Language Models
Judge as A Judge: Improving the Evaluation of Retrieval-Augmented Generation through the Judge-Consistency of Large Language Models
Shuliang Liu
Xinze Li
Zhenghao Liu
Yukun Yan
Cheng Yang
Zheni Zeng
Zhiyuan Liu
Maosong Sun
Ge Yu
RALM
113
3
0
26 Feb 2025
Shh, don't say that! Domain Certification in LLMs
Shh, don't say that! Domain Certification in LLMs
Cornelius Emde
Alasdair Paren
Preetham Arvind
Maxime Kayser
Tom Rainforth
Thomas Lukasiewicz
Guohao Li
Philip Torr
Adel Bibi
66
1
0
26 Feb 2025
Self-Memory Alignment: Mitigating Factual Hallucinations with Generalized Improvement
Self-Memory Alignment: Mitigating Factual Hallucinations with Generalized Improvement
Siyuan Zhang
Yuanhang Zhang
Yinpeng Dong
Hang Su
HILM
KELM
299
0
0
26 Feb 2025
OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment
OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment
Jiaxin Deng
Shiyao Wang
Kuo Cai
Lejian Ren
Qigen Hu
Weifeng Ding
Qiang Luo
Guorui Zhou
84
3
0
26 Feb 2025
Conformal Linguistic Calibration: Trading-off between Factuality and Specificity
Conformal Linguistic Calibration: Trading-off between Factuality and Specificity
Zhengping Jiang
Anqi Liu
Benjamin Van Durme
92
1
0
26 Feb 2025
Bián: A Bilingual Benchmark and Model for Hallucination Detection in Retrieval-Augmented Generation
Bián: A Bilingual Benchmark and Model for Hallucination Detection in Retrieval-Augmented Generation
Zhouyu Jiang
Mengshu Sun
Qing Cui
Lei Liang
RALM
3DV
283
0
0
26 Feb 2025
DualSpec: Text-to-spatial-audio Generation via Dual-Spectrogram Guided Diffusion Model
DualSpec: Text-to-spatial-audio Generation via Dual-Spectrogram Guided Diffusion Model
Lei Zhao
Sizhou Chen
Linfeng Feng
Ju Liu
Xuelong Li
DiffM
MDE
76
1
0
26 Feb 2025
Rethinking LLM Unlearning Objectives: A Gradient Perspective and Go Beyond
Rethinking LLM Unlearning Objectives: A Gradient Perspective and Go Beyond
Qizhou Wang
Jin Peng Zhou
Zhanke Zhou
Saebyeol Shin
Bo Han
Kilian Q. Weinberger
AILaw
ELM
MU
73
4
0
26 Feb 2025
ZEBRA: Leveraging Model-Behavioral Knowledge for Zero-Annotation Preference Dataset Construction
ZEBRA: Leveraging Model-Behavioral Knowledge for Zero-Annotation Preference Dataset Construction
Jeesu Jung
Chanjun Park
Sangkeun Jung
81
0
0
26 Feb 2025
FaithUn: Toward Faithful Forgetting in Language Models by Investigating the Interconnectedness of Knowledge
FaithUn: Toward Faithful Forgetting in Language Models by Investigating the Interconnectedness of Knowledge
Nakyeong Yang
Minsung Kim
Seunghyun Yoon
Joongbo Shin
Kyomin Jung
KELM
MU
67
0
0
26 Feb 2025
Controlled Diversity: Length-optimized Natural Language Generation
Controlled Diversity: Length-optimized Natural Language Generation
Diana Marie Schenke
Timo Baumann
49
0
0
26 Feb 2025
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
Qingpei Guo
Kaiyou Song
Zipeng Feng
Ziping Ma
Qinglong Zhang
...
Yunxiao Sun
Tai-WeiChang
Jingdong Chen
Ming Yang
Jun Zhou
MLLM
VLM
92
3
0
26 Feb 2025
Preference-Based Gradient Estimation for ML-Based Approximate Combinatorial Optimization
Preference-Based Gradient Estimation for ML-Based Approximate Combinatorial Optimization
Arman Mielke
Uwe Bauknecht
Thilo Strauss
Mathias Niepert
79
0
0
26 Feb 2025
Self-rewarding correction for mathematical reasoning
Self-rewarding correction for mathematical reasoning
Wei Xiong
Hanning Zhang
Chenlu Ye
Lichang Chen
Nan Jiang
Tong Zhang
ReLM
KELM
LRM
80
10
0
26 Feb 2025
PEToolLLM: Towards Personalized Tool Learning in Large Language Models
Qiancheng Xu
Ying Li
Heming Xia
Fan Liu
Min Yang
Wenjie Li
77
0
0
26 Feb 2025
Previous
123...121314...515253
Next