ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.18290
  4. Cited By
Direct Preference Optimization: Your Language Model is Secretly a Reward
  Model

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

29 May 2023
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
    ALM
ArXivPDFHTML

Papers citing "Direct Preference Optimization: Your Language Model is Secretly a Reward Model"

50 / 2,663 papers shown
Title
Exploring Safety-Utility Trade-Offs in Personalized Language Models
Exploring Safety-Utility Trade-Offs in Personalized Language Models
Anvesh Rao Vijjini
Somnath Basu Roy Chowdhury
Snigdha Chaturvedi
63
7
0
17 Jun 2024
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model
Yongting Zhang
Lu Chen
Guodong Zheng
Yifeng Gao
Rui Zheng
...
Yu Qiao
Xuanjing Huang
Feng Zhao
Tao Gui
Jing Shao
VLM
90
27
0
17 Jun 2024
Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization
Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization
Wenkai Yang
Shiqi Shen
Guangyao Shen
Zhi Gong
Yankai Lin
Zhi Gong
Yankai Lin
Ji-Rong Wen
66
14
0
17 Jun 2024
Adaptive Query Rewriting: Aligning Rewriters through Marginal
  Probability of Conversational Answers
Adaptive Query Rewriting: Aligning Rewriters through Marginal Probability of Conversational Answers
Tianhua Zhang
Kun Li
Hongyin Luo
Xixin Wu
James Glass
Helen Meng
52
4
0
16 Jun 2024
Toward Optimal LLM Alignments Using Two-Player Games
Toward Optimal LLM Alignments Using Two-Player Games
Rui Zheng
Hongyi Guo
Zhihan Liu
Xiaoying Zhang
Yuanshun Yao
...
Tao Gui
Qi Zhang
Xuanjing Huang
Hang Li
Yang Liu
78
5
0
16 Jun 2024
Eliminating Biased Length Reliance of Direct Preference Optimization via
  Down-Sampled KL Divergence
Eliminating Biased Length Reliance of Direct Preference Optimization via Down-Sampled KL Divergence
Junru Lu
Jiazheng Li
Siyu An
Meng Zhao
Yulan He
Di Yin
Xing Sun
52
16
0
16 Jun 2024
RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language
  Models
RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models
Zhuoran Jin
Pengfei Cao
Chenhao Wang
Zhitao He
Hongbang Yuan
Jiachun Li
Yubo Chen
Kang Liu
Jun Zhao
KELM
MU
69
15
0
16 Jun 2024
Step-level Value Preference Optimization for Mathematical Reasoning
Step-level Value Preference Optimization for Mathematical Reasoning
Guoxin Chen
Minpeng Liao
Chengxi Li
Kai Fan
LRM
40
34
0
16 Jun 2024
CBGBench: Fill in the Blank of Protein-Molecule Complex Binding Graph
CBGBench: Fill in the Blank of Protein-Molecule Complex Binding Graph
Haitao Lin
Guojiang Zhao
Odin Zhang
Yufei Huang
Lirong Wu
Zicheng Liu
Siyuan Li
Cheng Tan
Zhifeng Gao
Stan Z. Li
46
3
0
16 Jun 2024
Self-Evolution Fine-Tuning for Policy Optimization
Self-Evolution Fine-Tuning for Policy Optimization
Ruijun Chen
Jiehao Liang
Shiping Gao
Fanqi Wan
Xiaojun Quan
54
0
0
16 Jun 2024
Improving Reward-Conditioned Policies for Multi-Armed Bandits using
  Normalized Weight Functions
Improving Reward-Conditioned Policies for Multi-Armed Bandits using Normalized Weight Functions
Kai Xu
Farid Tajaddodianfar
Ben Allison
26
0
0
16 Jun 2024
Towards Understanding Jailbreak Attacks in LLMs: A Representation Space
  Analysis
Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis
Yuping Lin
Pengfei He
Han Xu
Yue Xing
Makoto Yamada
Hui Liu
Jiliang Tang
46
12
0
16 Jun 2024
Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for
  Cartoon Captioning
Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon Captioning
Jifan Zhang
Lalit P. Jain
Yang Guo
Jiayi Chen
Kuan Lok Zhou
...
Scott Sievert
Timothy T. Rogers
Kevin Jamieson
Robert Mankoff
Robert Nowak
67
5
0
15 Jun 2024
CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation
CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation
Wei Chen
Lin Li
Yongqi Yang
Bin Wen
Fan Yang
Tingting Gao
Yu Wu
Long Chen
VLM
VGen
52
7
0
15 Jun 2024
Regularizing Hidden States Enables Learning Generalizable Reward Model
  for LLMs
Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs
Rui Yang
Ruomeng Ding
Yong Lin
Huan Zhang
Tong Zhang
68
43
0
14 Jun 2024
Off-Policy Evaluation from Logged Human Feedback
Off-Policy Evaluation from Logged Human Feedback
Aniruddha Bhargava
Lalit P. Jain
Branislav Kveton
Ge Liu
Subhojyoti Mukherjee
OffRL
34
2
0
14 Jun 2024
Deep Bayesian Active Learning for Preference Modeling in Large Language
  Models
Deep Bayesian Active Learning for Preference Modeling in Large Language Models
Luckeciano C. Melo
P. Tigas
Alessandro Abate
Yarin Gal
61
8
0
14 Jun 2024
BiVLC: Extending Vision-Language Compositionality Evaluation with
  Text-to-Image Retrieval
BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval
Imanol Miranda
Ander Salaberria
Eneko Agirre
Gorka Azkune
CoGe
51
1
0
14 Jun 2024
Knowledge Editing in Language Models via Adapted Direct Preference
  Optimization
Knowledge Editing in Language Models via Adapted Direct Preference Optimization
Amit Rozner
Barak Battash
Lior Wolf
Ofir Lindenbaum
KELM
60
9
0
14 Jun 2024
GEB-1.3B: Open Lightweight Large Language Model
GEB-1.3B: Open Lightweight Large Language Model
Jie Wu
Yufeng Zhu
Lei Shen
Xuqing Lu
ALM
37
0
0
14 Jun 2024
A Survey on Large Language Models from General Purpose to Medical
  Applications: Datasets, Methodologies, and Evaluations
A Survey on Large Language Models from General Purpose to Medical Applications: Datasets, Methodologies, and Evaluations
Jinqiang Wang
Huansheng Ning
Yi Peng
Qikai Wei
Daniel Tesfai
Wenwei Mao
Tao Zhu
Runhe Huang
LM&MA
AI4MH
ELM
62
5
0
14 Jun 2024
Bootstrapping Language Models with DPO Implicit Rewards
Bootstrapping Language Models with DPO Implicit Rewards
Changyu Chen
Zichen Liu
Chao Du
Tianyu Pang
Qian Liu
Arunesh Sinha
Pradeep Varakantham
Min Lin
SyDa
ALM
77
23
0
14 Jun 2024
Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks
  and Algorithms
Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms
Miaosen Zhang
Yixuan Wei
Zhen Xing
Yifei Ma
Zuxuan Wu
...
Zheng Zhang
Qi Dai
Chong Luo
Xin Geng
Baining Guo
VLM
51
1
0
13 Jun 2024
Understanding Jailbreak Success: A Study of Latent Space Dynamics in
  Large Language Models
Understanding Jailbreak Success: A Study of Latent Space Dynamics in Large Language Models
Sarah Ball
Frauke Kreuter
Nina Rimsky
45
15
0
13 Jun 2024
Unpacking DPO and PPO: Disentangling Best Practices for Learning from
  Preference Feedback
Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback
Hamish Ivison
Yizhong Wang
Jiacheng Liu
Zeqiu Wu
Valentina Pyatkin
Nathan Lambert
Noah A. Smith
Yejin Choi
Hannaneh Hajishirzi
59
45
0
13 Jun 2024
On Softmax Direct Preference Optimization for Recommendation
On Softmax Direct Preference Optimization for Recommendation
Yuxin Chen
Junfei Tan
An Zhang
Zhengyi Yang
Leheng Sheng
Enzhi Zhang
Xiang Wang
Tat-Seng Chua
51
27
0
13 Jun 2024
Chain of Preference Optimization: Improving Chain-of-Thought Reasoning
  in LLMs
Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs
Xuan Zhang
Chao Du
Tianyu Pang
Qian Liu
Wei Gao
Min Lin
LRM
AI4CE
49
44
0
13 Jun 2024
ContraSolver: Self-Alignment of Language Models by Resolving Internal
  Preference Contradictions
ContraSolver: Self-Alignment of Language Models by Resolving Internal Preference Contradictions
Xu Zhang
Xunjian Yin
Xiaojun Wan
55
3
0
13 Jun 2024
Online Bandit Learning with Offline Preference Data for Improved RLHF
Online Bandit Learning with Offline Preference Data for Improved RLHF
Akhil Agnihotri
Rahul Jain
Deepak Ramachandran
Zheng Wen
OffRL
63
2
0
13 Jun 2024
HelpSteer2: Open-source dataset for training top-performing reward
  models
HelpSteer2: Open-source dataset for training top-performing reward models
Zhilin Wang
Yi Dong
Olivier Delalleau
Jiaqi Zeng
Gerald Shen
Daniel Egert
Jimmy J. Zhang
Makesh Narsimhan Sreedhar
Oleksii Kuchaiev
AI4TS
57
90
0
12 Jun 2024
Mistral-C2F: Coarse to Fine Actor for Analytical and Reasoning
  Enhancement in RLHF and Effective-Merged LLMs
Mistral-C2F: Coarse to Fine Actor for Analytical and Reasoning Enhancement in RLHF and Effective-Merged LLMs
Chen Zheng
Ke Sun
Xun Zhou
MoE
56
0
0
12 Jun 2024
Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning
  Framework from Logit Difference
Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference
Jiabao Ji
Yujian Liu
Yang Zhang
Gaowen Liu
Ramana Rao Kompella
Sijia Liu
Shiyu Chang
KELM
MU
56
24
0
12 Jun 2024
PAL: Pluralistic Alignment Framework for Learning from Heterogeneous
  Preferences
PAL: Pluralistic Alignment Framework for Learning from Heterogeneous Preferences
Daiwei Chen
Yi Chen
Aniket Rege
Ramya Korlakai Vinayak
61
18
0
12 Jun 2024
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs
  with Nothing
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
Zhangchen Xu
Fengqing Jiang
Luyao Niu
Yuntian Deng
Radha Poovendran
Yejin Choi
Bill Yuchen Lin
SyDa
64
133
0
12 Jun 2024
Discovering Preference Optimization Algorithms with and for Large
  Language Models
Discovering Preference Optimization Algorithms with and for Large Language Models
Chris Xiaoxuan Lu
Samuel Holt
Claudio Fanconi
Alex J. Chan
Jakob Foerster
M. Schaar
R. T. Lange
OffRL
45
16
0
12 Jun 2024
Legend: Leveraging Representation Engineering to Annotate Safety Margin
  for Preference Datasets
Legend: Leveraging Representation Engineering to Annotate Safety Margin for Preference Datasets
Duanyu Feng
Bowen Qin
Chen Huang
Youcheng Huang
Zheng Zhang
Wenqiang Lei
49
3
0
12 Jun 2024
Supportiveness-based Knowledge Rewriting for Retrieval-augmented
  Language Modeling
Supportiveness-based Knowledge Rewriting for Retrieval-augmented Language Modeling
Zile Qiao
Wei Ye
Yong Jiang
Tong Mo
Pengjun Xie
Weiping Li
Fei Huang
Shikun Zhang
KELM
46
4
0
12 Jun 2024
It Takes Two: On the Seamlessness between Reward and Policy Model in
  RLHF
It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF
Taiming Lu
Lingfeng Shen
Xinyu Yang
Weiting Tan
Beidi Chen
Huaxiu Yao
65
2
0
12 Jun 2024
Asymptotically Optimal Regret for Black-Box Predict-then-Optimize
Asymptotically Optimal Regret for Black-Box Predict-then-Optimize
Samuel Tan
P. Frazier
46
0
0
12 Jun 2024
Collective Constitutional AI: Aligning a Language Model with Public
  Input
Collective Constitutional AI: Aligning a Language Model with Public Input
Saffron Huang
Divya Siddarth
Liane Lovitt
Thomas I. Liao
Esin Durmus
Alex Tamkin
Deep Ganguli
ELM
86
72
0
12 Jun 2024
Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL
Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL
Zijin Hong
Zheng Yuan
Qinggang Zhang
Hao Chen
Junnan Dong
Feiran Huang
Xiao Huang
77
58
0
12 Jun 2024
OPTune: Efficient Online Preference Tuning
OPTune: Efficient Online Preference Tuning
Lichang Chen
Jiuhai Chen
Chenxi Liu
John Kirchenbauer
Davit Soselia
Chen Zhu
Tom Goldstein
Dinesh Manocha
Heng Huang
47
4
0
11 Jun 2024
THaLLE: Text Hyperlocally Augmented Large Language Extension --
  Technical Report
THaLLE: Text Hyperlocally Augmented Large Language Extension -- Technical Report
Kbtg Labs
Danupat Khamnuansin
Atthakorn Petchsod
Anuruth Lertpiya
Pornchanan Balee
Thanawat Lodkaew
Tawunrat Chalothorn
Thadpong Pongthawornkamol
Monchai Lertsutthiwong
21
0
0
11 Jun 2024
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Qining Zhang
Honghao Wei
Lei Ying
OffRL
67
1
0
11 Jun 2024
Merging Improves Self-Critique Against Jailbreak Attacks
Merging Improves Self-Critique Against Jailbreak Attacks
Victor Gallego
AAML
MoMe
62
3
0
11 Jun 2024
Teaching Language Models to Self-Improve by Learning from Language
  Feedback
Teaching Language Models to Self-Improve by Learning from Language Feedback
Chi Hu
Yimin Hu
Hang Cao
Tong Xiao
Jingbo Zhu
LRM
VLM
40
4
0
11 Jun 2024
Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees
Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees
Sijia Chen
Yibo Wang
Yi-Feng Wu
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
Lijun Zhang
LLMAG
LRM
55
13
0
11 Jun 2024
3D-Properties: Identifying Challenges in DPO and Charting a Path Forward
3D-Properties: Identifying Challenges in DPO and Charting a Path Forward
Yuzi Yan
Yibo Miao
J. Li
Yipin Zhang
Jian Xie
Zhijie Deng
Dong Yan
62
11
0
11 Jun 2024
Direct Preference Optimization for Suppressing Hallucinated Prior Exams
  in Radiology Report Generation
Direct Preference Optimization for Suppressing Hallucinated Prior Exams in Radiology Report Generation
Oishi Banerjee
Hong-Yu Zhou
Subathra Adithan
Stephen Kwak
Kay Wu
Pranav Rajpurkar
MedIm
57
3
0
10 Jun 2024
Survey for Landing Generative AI in Social and E-commerce Recsys -- the
  Industry Perspectives
Survey for Landing Generative AI in Social and E-commerce Recsys -- the Industry Perspectives
Da Xu
Danqing Zhang
Guangyu Yang
Bo Yang
Shuyuan Xu
Lingling Zheng
Cindy Liang
32
2
0
10 Jun 2024
Previous
123...353637...525354
Next