ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.18290
  4. Cited By
Direct Preference Optimization: Your Language Model is Secretly a Reward
  Model

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

29 May 2023
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
    ALM
ArXivPDFHTML

Papers citing "Direct Preference Optimization: Your Language Model is Secretly a Reward Model"

50 / 2,637 papers shown
Title
Understanding the Logic of Direct Preference Alignment through Logic
Understanding the Logic of Direct Preference Alignment through Logic
Kyle Richardson
Vivek Srikumar
Ashish Sabharwal
90
2
0
23 Dec 2024
Cannot or Should Not? Automatic Analysis of Refusal Composition in
  IFT/RLHF Datasets and Refusal Behavior of Black-Box LLMs
Cannot or Should Not? Automatic Analysis of Refusal Composition in IFT/RLHF Datasets and Refusal Behavior of Black-Box LLMs
Alexander von Recum
Christoph Schnabl
Gabor Hollbeck
Silas Alberti
Philip Blinde
Marvin von Hagen
94
2
0
22 Dec 2024
Online Learning from Strategic Human Feedback in LLM Fine-Tuning
Online Learning from Strategic Human Feedback in LLM Fine-Tuning
Shugang Hao
Lingjie Duan
97
3
0
22 Dec 2024
When Can Proxies Improve the Sample Complexity of Preference Learning?
When Can Proxies Improve the Sample Complexity of Preference Learning?
Yuchen Zhu
Daniel Augusto de Souza
Zhengyan Shi
Mengyue Yang
Pasquale Minervini
Alexander DÁmour
Matt J. Kusner
90
0
0
21 Dec 2024
Large Language Model Can Be a Foundation for Hidden Rationale-Based Retrieval
Large Language Model Can Be a Foundation for Hidden Rationale-Based Retrieval
Luo Ji
Feixiang Guo
Teng Chen
Qingqing Gu
Xiaoyu Wang
...
Peng Yu
Yue Zhao
Hongyang Lei
Zhonglin Jiang
Yong Chen
RALM
LRM
102
0
0
21 Dec 2024
JailPO: A Novel Black-box Jailbreak Framework via Preference
  Optimization against Aligned LLMs
JailPO: A Novel Black-box Jailbreak Framework via Preference Optimization against Aligned LLMs
Haoyang Li
Jiawei Ye
Jie Wu
Tianjie Yan
Chu Wang
Zhixin Li
AAML
85
0
0
20 Dec 2024
REFA: Reference Free Alignment for multi-preference optimization
REFA: Reference Free Alignment for multi-preference optimization
Taneesh Gupta
Rahul Madhavan
Xuchao Zhang
Chetan Bansal
Saravan Rajmohan
99
1
0
20 Dec 2024
Northeastern Uni at Multilingual Counterspeech Generation: Enhancing
  Counter Speech Generation with LLM Alignment through Direct Preference
  Optimization
Northeastern Uni at Multilingual Counterspeech Generation: Enhancing Counter Speech Generation with LLM Alignment through Direct Preference Optimization
Sahil Wadhwa
Chengtian Xu
Haoming Chen
Aakash Mahalingam
Akankshya Kar
Divya Chaudhary
83
0
0
19 Dec 2024
Think&Cite: Improving Attributed Text Generation with Self-Guided Tree
  Search and Progress Reward Modeling
Think&Cite: Improving Attributed Text Generation with Self-Guided Tree Search and Progress Reward Modeling
Junyi Li
Hwee Tou Ng
LRM
105
1
0
19 Dec 2024
Learning to Generate Research Idea with Dynamic Control
Learning to Generate Research Idea with Dynamic Control
Ruochen Li
Liqiang Jing
Chi Han
Jiawei Zhou
Xinya Du
LRM
87
3
0
19 Dec 2024
PA-RAG: RAG Alignment via Multi-Perspective Preference Optimization
PA-RAG: RAG Alignment via Multi-Perspective Preference Optimization
Jiayi Wu
Hengyi Cai
Lingyong Yan
Hao Sun
Xiang Li
Shuaiqiang Wang
Dawei Yin
Ming Gao
132
0
0
19 Dec 2024
SATA: A Paradigm for LLM Jailbreak via Simple Assistive Task Linkage
SATA: A Paradigm for LLM Jailbreak via Simple Assistive Task Linkage
Xiaoning Dong
Wenbo Hu
Wei Xu
Tianxing He
85
0
0
19 Dec 2024
VideoDPO: Omni-Preference Alignment for Video Diffusion Generation
VideoDPO: Omni-Preference Alignment for Video Diffusion Generation
Runtao Liu
Haoyu Wu
Zheng Ziqiang
Chen Wei
Yingqing He
Renjie Pi
Qifeng Chen
VGen
90
14
0
18 Dec 2024
Hansel: Output Length Controlling Framework for Large Language Models
Hansel: Output Length Controlling Framework for Large Language Models
Seoha Song
Junhyun Lee
Hyeonmok Ko
80
0
0
18 Dec 2024
Energy-Based Preference Model Offers Better Offline Alignment than the
  Bradley-Terry Preference Model
Energy-Based Preference Model Offers Better Offline Alignment than the Bradley-Terry Preference Model
Yuzhong Hong
Hanshan Zhang
Junwei Bao
Hongfei Jiang
Yang Song
OffRL
85
2
0
18 Dec 2024
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented
  Generation for Preference Alignment
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment
Zhuoran Jin
Hongbang Yuan
Tianyi Men
Pengfei Cao
Yubo Chen
Kang Liu
Jun Zhao
ALM
99
7
0
18 Dec 2024
Context-DPO: Aligning Language Models for Context-Faithfulness
Context-DPO: Aligning Language Models for Context-Faithfulness
Baolong Bi
Shaohan Huang
Yansen Wang
Tianchi Yang
Zihan Zhang
...
Furu Wei
Weiwei Deng
Feng Sun
Qi Zhang
Shenghua Liu
116
11
0
18 Dec 2024
Gradual Vigilance and Interval Communication: Enhancing Value Alignment
  in Multi-Agent Debates
Gradual Vigilance and Interval Communication: Enhancing Value Alignment in Multi-Agent Debates
Rui Zou
Mengqi Wei
Jintian Feng
Qian Wan
Jianwen Sun
Sannyuya Liu
82
0
0
18 Dec 2024
Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection
Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection
Le Yang
Ziwei Zheng
Boxu Chen
Zhengyu Zhao
Chenhao Lin
Chao Shen
VLM
148
3
0
18 Dec 2024
An Automated Explainable Educational Assessment System Built on LLMs
An Automated Explainable Educational Assessment System Built on LLMs
Jiazheng Li
Artem Bobrov
David West
Cesare Aloisi
Yulan He
90
2
0
17 Dec 2024
Preference-Oriented Supervised Fine-Tuning: Favoring Target Model Over
  Aligned Large Language Models
Preference-Oriented Supervised Fine-Tuning: Favoring Target Model Over Aligned Large Language Models
Yuchen Fan
Yuzhong Hong
Qiushi Wang
Junwei Bao
Hongfei Jiang
Yang Song
88
1
0
17 Dec 2024
NLSR: Neuron-Level Safety Realignment of Large Language Models Against
  Harmful Fine-Tuning
NLSR: Neuron-Level Safety Realignment of Large Language Models Against Harmful Fine-Tuning
Xin Yi
Shunfan Zheng
Linlin Wang
Gerard de Melo
Xiaoling Wang
Liang He
98
7
0
17 Dec 2024
DateLogicQA: Benchmarking Temporal Biases in Large Language Models
DateLogicQA: Benchmarking Temporal Biases in Large Language Models
Gagan Bhatia
MingZe Tang
Cristina Mahanta
Madiha Kazi
84
0
0
17 Dec 2024
Context Filtering with Reward Modeling in Question Answering
Context Filtering with Reward Modeling in Question Answering
Sangryul Kim
James Thorne
78
0
0
16 Dec 2024
Self-Adaptive Paraphrasing and Preference Learning for Improved Claim
  Verifiability
Self-Adaptive Paraphrasing and Preference Learning for Improved Claim Verifiability
Amelie Wuhrl
Roman Klinger
90
0
0
16 Dec 2024
ACE-$M^3$: Automatic Capability Evaluator for Multimodal Medical Models
ACE-M3M^3M3: Automatic Capability Evaluator for Multimodal Medical Models
Xiechi Zhang
Shunfan Zheng
Linlin Wang
Gerard de Melo
Zhu Cao
Xiaoling Wang
Liang He
ELM
133
0
0
16 Dec 2024
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
Jiale Cheng
Xiao-Chang Liu
C. Wang
Xiaotao Gu
Yaojie Lu
Dan Zhang
Yuxiao Dong
J. Tang
Hongning Wang
Minlie Huang
LRM
134
3
0
16 Dec 2024
UAlign: Leveraging Uncertainty Estimations for Factuality Alignment on Large Language Models
UAlign: Leveraging Uncertainty Estimations for Factuality Alignment on Large Language Models
Boyang Xue
Fei Mi
Qi Zhu
Hongru Wang
Rui Wang
Sheng Wang
Erxin Yu
Xuming Hu
Kam-Fai Wong
HILM
95
1
0
16 Dec 2024
The Superalignment of Superhuman Intelligence with Large Language Models
The Superalignment of Superhuman Intelligence with Large Language Models
Minlie Huang
Yingkang Wang
Shiyao Cui
Pei Ke
J. Tang
129
1
0
15 Dec 2024
Safe Reinforcement Learning using Finite-Horizon Gradient-based
  Estimation
Safe Reinforcement Learning using Finite-Horizon Gradient-based Estimation
Juntao Dai
Yaodong Yang
Qian Zheng
Gang Pan
OffRL
91
2
0
15 Dec 2024
Dual Traits in Probabilistic Reasoning of Large Language Models
Dual Traits in Probabilistic Reasoning of Large Language Models
Shenxiong Li
Huaxia Rui
85
0
0
15 Dec 2024
Hybrid Preference Optimization for Alignment: Provably Faster
  Convergence Rates by Combining Offline Preferences with Online Exploration
Hybrid Preference Optimization for Alignment: Provably Faster Convergence Rates by Combining Offline Preferences with Online Exploration
Avinandan Bose
Zhihan Xiong
Aadirupa Saha
S. Du
Maryam Fazel
86
1
0
13 Dec 2024
WHAT-IF: Exploring Branching Narratives by Meta-Prompting Large Language
  Models
WHAT-IF: Exploring Branching Narratives by Meta-Prompting Large Language Models
Runsheng Huang
Lara J. Martin
Chris Callison-Burch
LRM
AI4CE
LLMAG
79
0
0
13 Dec 2024
SafetyDPO: Scalable Safety Alignment for Text-to-Image Generation
SafetyDPO: Scalable Safety Alignment for Text-to-Image Generation
Runtao Liu
Chen I Chieh
Jindong Gu
Jipeng Zhang
Renjie Pi
Qifeng Chen
Philip Torr
Ashkan Khakzar
Fabio Pizzati
EGVM
114
0
0
13 Dec 2024
MPPO: Multi Pair-wise Preference Optimization for LLMs with Arbitrary
  Negative Samples
MPPO: Multi Pair-wise Preference Optimization for LLMs with Arbitrary Negative Samples
Shuo Xie
Fangzhi Zhu
Jiahui Wang
Lulu Wen
Wei Dai
Xiaowei Chen
Junxiong Zhu
Kai Zhou
Bo Zheng
79
0
0
13 Dec 2024
ROUTE: Robust Multitask Tuning and Collaboration for Text-to-SQL
ROUTE: Robust Multitask Tuning and Collaboration for Text-to-SQL
Yang Qin
Chao Chen
Z. Fu
Ze Chen
Dezhong Peng
Peng Hu
Jieping Ye
110
3
0
13 Dec 2024
DECOR:Decomposition and Projection of Text Embeddings for Text-to-Image
  Customization
DECOR:Decomposition and Projection of Text Embeddings for Text-to-Image Customization
Geonhui Jang
Jin-Hwa Kim
Yong-Hyun Park
Junho Kim
Gayoung Lee
Yonghyun Jeong
DiffM
100
0
0
12 Dec 2024
Phi-4 Technical Report
Phi-4 Technical Report
Marah Abdin
J. Aneja
Harkirat Singh Behl
Sébastien Bubeck
Ronen Eldan
...
Rachel A. Ward
Yue Wu
Dingli Yu
Cyril Zhang
Yi Zhang
ALM
SyDa
121
98
0
12 Dec 2024
Test-Time Alignment via Hypothesis Reweighting
Test-Time Alignment via Hypothesis Reweighting
Yoonho Lee
Jonathan Williams
Henrik Marklund
Archit Sharma
E. Mitchell
Anikait Singh
Chelsea Finn
101
4
0
11 Dec 2024
Learning to Reason via Self-Iterative Process Feedback for Small
  Language Models
Learning to Reason via Self-Iterative Process Feedback for Small Language Models
Kaiyuan Chen
Jin Wang
Xuejie Zhang
LRM
ReLM
90
2
0
11 Dec 2024
PrisonBreak: Jailbreaking Large Language Models with Fewer Than
  Twenty-Five Targeted Bit-flips
PrisonBreak: Jailbreaking Large Language Models with Fewer Than Twenty-Five Targeted Bit-flips
Zachary Coalson
Jeonghyun Woo
Shiyang Chen
Yu Sun
Lishan Yang
Prashant J. Nair
Bo Fang
Sanghyun Hong
AAML
92
2
0
10 Dec 2024
Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets
Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets
Zhen Liu
Tim Z. Xiao
Weiyang Liu
Yoshua Bengio
Dinghuai Zhang
123
4
0
10 Dec 2024
MMedPO: Aligning Medical Vision-Language Models with Clinical-Aware Multimodal Preference Optimization
MMedPO: Aligning Medical Vision-Language Models with Clinical-Aware Multimodal Preference Optimization
Kangyu Zhu
Peng Xia
Yun Li
Hongtu Zhu
Sheng Wang
Huaxiu Yao
111
1
0
09 Dec 2024
Reinforcement Learning Enhanced LLMs: A Survey
Reinforcement Learning Enhanced LLMs: A Survey
Shuhe Wang
Shengyu Zhang
Jingyang Zhang
Runyi Hu
Xiaoya Li
Tianwei Zhang
Jiwei Li
Fei Wu
G. Wang
Eduard H. Hovy
OffRL
136
7
0
05 Dec 2024
Time-Reversal Provides Unsupervised Feedback to LLMs
Time-Reversal Provides Unsupervised Feedback to LLMs
Yerram Varun
Rahul Madhavan
Sravanti Addepalli
A. Suggala
Karthikeyan Shanmugam
Prateek Jain
LRM
SyDa
79
0
0
03 Dec 2024
Progress-Aware Video Frame Captioning
Progress-Aware Video Frame Captioning
Zihui Xue
Joungbin An
Xitong Yang
Kristen Grauman
117
1
0
03 Dec 2024
PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos
PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos
Meng Cao
Haoran Tang
Haoze Zhao
Hangyu Guo
Jing Liu
Ge Zhang
Ruyang Liu
Qiang Sun
Ian Reid
Xiaodan Liang
115
2
0
02 Dec 2024
Harnessing Preference Optimisation in Protein LMs for Hit Maturation in
  Cell Therapy
Harnessing Preference Optimisation in Protein LMs for Hit Maturation in Cell Therapy
Katarzyna Janocha
Annabel Ling
Alice Godson
Yulia Lampi
Simon Bornschein
Nils Y. Hammerla
84
2
0
02 Dec 2024
Yi-Lightning Technical Report
Yi-Lightning Technical Report
01. AI
:
Alan Wake
Albert Wang
Bei Chen
...
Yuxuan Sha
Zhaodong Yan
Zhiyuan Liu
Zirui Zhang
Zonghong Dai
OSLM
102
3
0
02 Dec 2024
Towards Adaptive Mechanism Activation in Language Agent
Towards Adaptive Mechanism Activation in Language Agent
Ziyang Huang
Jun Zhao
Kang Liu
LLMAG
AI4CE
85
0
0
01 Dec 2024
Previous
123...181920...515253
Next