ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.18290
  4. Cited By
Direct Preference Optimization: Your Language Model is Secretly a Reward
  Model

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

29 May 2023
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
    ALM
ArXivPDFHTML

Papers citing "Direct Preference Optimization: Your Language Model is Secretly a Reward Model"

50 / 2,637 papers shown
Title
Reciprocity-Aware Convolutional Neural Networks for Map-Based Path Loss Prediction
Reciprocity-Aware Convolutional Neural Networks for Map-Based Path Loss Prediction
Ryan Dempsey
Jonathan Ethier
Halim Yanikomeroglu
44
0
0
04 Apr 2025
Sample, Don't Search: Rethinking Test-Time Alignment for Language Models
Sample, Don't Search: Rethinking Test-Time Alignment for Language Models
Gonçalo Faria
Noah A. Smith
36
1
0
04 Apr 2025
AIR: A Systematic Analysis of Annotations, Instructions, and Response Pairs in Preference Dataset
AIR: A Systematic Analysis of Annotations, Instructions, and Response Pairs in Preference Dataset
Bingxiang He
Wenbin Zhang
Jiaxi Song
Cheng Qian
Z. Fu
...
Hui Xue
Ganqu Cui
Wanxiang Che
Zhiyuan Liu
Maosong Sun
39
0
0
04 Apr 2025
Language Models Are Implicitly Continuous
Language Models Are Implicitly Continuous
Samuele Marro
Davide Evangelista
X. A. Huang
Emanuele La Malfa
M. Lombardi
Michael Wooldridge
35
1
0
04 Apr 2025
On the Connection Between Diffusion Models and Molecular Dynamics
On the Connection Between Diffusion Models and Molecular Dynamics
Liam Harcombe
Timothy T. Duignan
DiffM
59
0
0
04 Apr 2025
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme
Yan Ma
Steffi Chern
Xuyang Shen
Yiran Zhong
Pengfei Liu
OffRL
LRM
71
2
0
03 Apr 2025
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning
Xianwei Zhuang
Yuxin Xie
Yufan Deng
Dongchao Yang
Liming Liang
Jinghan Ru
Yuguo Yin
Yuexian Zou
71
3
0
03 Apr 2025
Inference-Time Scaling for Generalist Reward Modeling
Inference-Time Scaling for Generalist Reward Modeling
Zijun Liu
P. Wang
Ran Xu
Shirong Ma
Chong Ruan
Ziwei Sun
Yang Liu
Y. Wu
OffRL
LRM
54
21
0
03 Apr 2025
Measurement of LLM's Philosophies of Human Nature
Measurement of LLM's Philosophies of Human Nature
Minheng Ni
Ennan Wu
Zidong Gong
Zheng Yang
Linjie Li
Chung-Ching Lin
Kevin Qinghong Lin
Lijuan Wang
Wangmeng Zuo
42
0
0
03 Apr 2025
F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization
F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization
Xiaohui Sun
Ruitong Xiao
Jianye Mo
Bowen Wu
Qun Yu
Baoxun Wang
51
1
0
03 Apr 2025
LearNAT: Learning NL2SQL with AST-guided Task Decomposition for Large Language Models
LearNAT: Learning NL2SQL with AST-guided Task Decomposition for Large Language Models
Weibin Liao
Xin Gao
Tianyu Jia
Rihong Qiu
Yifan Zhu
Yang Lin
Xu Chu
Junfeng Zhao
Yasha Wang
40
0
0
03 Apr 2025
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
Kai Ye
Hongyi Zhou
Jin Zhu
Francesco Quinzan
C. Shi
37
1
0
03 Apr 2025
How Post-Training Reshapes LLMs: A Mechanistic View on Knowledge, Truthfulness, Refusal, and Confidence
How Post-Training Reshapes LLMs: A Mechanistic View on Knowledge, Truthfulness, Refusal, and Confidence
Hongzhe Du
Weikai Li
Min Cai
Karim Saraipour
Zimin Zhang
Himabindu Lakkaraju
Yizhou Sun
Shichang Zhang
KELM
56
0
0
03 Apr 2025
More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment
More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment
Yifan Wang
Runjin Chen
Bolian Li
David Cho
Yihe Deng
Ruqi Zhang
Tianlong Chen
Zhangyang Wang
A. Grama
Junyuan Hong
SyDa
53
0
0
03 Apr 2025
Enhancing Chart-to-Code Generation in Multimodal Large Language Models via Iterative Dual Preference Learning
Enhancing Chart-to-Code Generation in Multimodal Large Language Models via Iterative Dual Preference Learning
Zhihan Zhang
Yixin Cao
Lizi Liao
28
0
0
03 Apr 2025
HALO: Human-Aligned End-to-end Image Retargeting with Layered Transformations
HALO: Human-Aligned End-to-end Image Retargeting with Layered Transformations
Yiran Xu
Siqi Xie
Zhuofang Li
Harris Shadmany
Yinxiao Li
...
Jesse Berent
Ming-Hsuan Yang
Irfan Essa
Jia-Bin Huang
Feng Yang
VOS
65
0
0
03 Apr 2025
The Hidden Space of Safety: Understanding Preference-Tuned LLMs in Multilingual context
The Hidden Space of Safety: Understanding Preference-Tuned LLMs in Multilingual context
Nikhil Verma
Manasa Bharadwaj
41
0
0
03 Apr 2025
DiaTool-DPO: Multi-Turn Direct Preference Optimization for Tool-Augmented Large Language Models
DiaTool-DPO: Multi-Turn Direct Preference Optimization for Tool-Augmented Large Language Models
S. Jung
Donghun Lee
Shinbok Lee
Gaeun Seo
Daniel Lee
Byeongil Ko
Junrae Cho
Kihyun Kim
EungGyun Kim
M. Shin
43
0
0
02 Apr 2025
Representation Bending for Large Language Model Safety
Representation Bending for Large Language Model Safety
Ashkan Yousefpour
Taeheon Kim
Ryan S. Kwon
Seungbeen Lee
Wonje Jeung
Seungju Han
Alvin Wan
Harrison Ngan
Youngjae Yu
Jonghyun Choi
AAML
ALM
KELM
59
2
0
02 Apr 2025
AdPO: Enhancing the Adversarial Robustness of Large Vision-Language Models with Preference Optimization
AdPO: Enhancing the Adversarial Robustness of Large Vision-Language Models with Preference Optimization
Chaohu Liu
Tianyi Gui
Yu Liu
Linli Xu
VLM
AAML
75
1
0
02 Apr 2025
A Survey of Scaling in Large Language Model Reasoning
A Survey of Scaling in Large Language Model Reasoning
Zihan Chen
Song Wang
Zhen Tan
Xingbo Fu
Zhenyu Lei
Peng Wang
Huan Liu
Cong Shen
Jundong Li
LRM
93
0
0
02 Apr 2025
Generative Retrieval and Alignment Model: A New Paradigm for E-commerce Retrieval
Generative Retrieval and Alignment Model: A New Paradigm for E-commerce Retrieval
Ming Pang
Chunyuan Yuan
Xiaoyu He
Zheng Fang
Donghao Xie
...
Xue Jiang
Changping Peng
Zhangang Lin
Zheng Luo
Jingping Shao
RALM
41
0
0
02 Apr 2025
Adaptive Rectification Sampling for Test-Time Compute Scaling
Adaptive Rectification Sampling for Test-Time Compute Scaling
Zhendong Tan
Xingjun Zhang
Chaoyi Hu
Yancheng Pan
Shaoxun Wang
LRM
46
0
0
02 Apr 2025
Hawkeye:Efficient Reasoning with Model Collaboration
Hawkeye:Efficient Reasoning with Model Collaboration
Jianshu She
Z. Li
Zhemin Huang
Qi Li
Peiran Xu
Haonan Li
Qirong Ho
LRM
60
3
0
01 Apr 2025
POPEN: Preference-Based Optimization and Ensemble for LVLM-Based Reasoning Segmentation
POPEN: Preference-Based Optimization and Ensemble for LVLM-Based Reasoning Segmentation
Lanyun Zhu
Tianrun Chen
Qianxiong Xu
Xuanyi Liu
Deyi Ji
Haiyang Wu
De Wen Soh
Jing Liu
VLM
LRM
50
0
0
01 Apr 2025
Improved Visual-Spatial Reasoning via R1-Zero-Like Training
Improved Visual-Spatial Reasoning via R1-Zero-Like Training
Zhenyi Liao
Qingsong Xie
Yanhao Zhang
Zijian Kong
Haonan Lu
Zhenyu Yang
Zhijie Deng
ReLM
VLM
LRM
107
1
1
01 Apr 2025
Efficient Construction of Model Family through Progressive Training Using Model Expansion
Efficient Construction of Model Family through Progressive Training Using Model Expansion
Kazuki Yano
Sho Takase
Sosuke Kobayashi
Shun Kiyono
Jun Suzuki
58
0
0
01 Apr 2025
Misaligned Roles, Misplaced Images: Structural Input Perturbations Expose Multimodal Alignment Blind Spots
Misaligned Roles, Misplaced Images: Structural Input Perturbations Expose Multimodal Alignment Blind Spots
Erfan Shayegani
G M Shahariar
Sara Abdali
Lei Yu
Nael B. Abu-Ghazaleh
Yue Dong
AAML
78
0
0
01 Apr 2025
Self-Routing RAG: Binding Selective Retrieval with Knowledge Verbalization
Self-Routing RAG: Binding Selective Retrieval with Knowledge Verbalization
Di Wu
Jia-Chen Gu
Kai-Wei Chang
Nanyun Peng
41
0
0
01 Apr 2025
Multi-Agent LLM Judge: automatic personalized LLM judge design for evaluating natural language generation applications
Multi-Agent LLM Judge: automatic personalized LLM judge design for evaluating natural language generation applications
Hongliu Cao
Ilias Driouich
Robin Singh
Eoin Thomas
ELM
45
0
0
01 Apr 2025
Boosting MLLM Reasoning with Text-Debiased Hint-GRPO
Boosting MLLM Reasoning with Text-Debiased Hint-GRPO
Qihan Huang
Long Chan
Jinlong Liu
Wanggui He
Hao Jiang
Mingli Song
Jingyuan Chen
Chang Yao
Jie Song
LRM
37
0
0
31 Mar 2025
Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute
Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute
Yingwei Ma
Binhua Li
Yihong Dong
Xue Jiang
Rongyu Cao
Jingshu Chen
Fei Huang
Yongqian Li
LLMAG
LRM
62
0
0
31 Mar 2025
Towards Scientific Intelligence: A Survey of LLM-based Scientific Agents
Towards Scientific Intelligence: A Survey of LLM-based Scientific Agents
Shuo Ren
Pu Jian
Zhenjiang Ren
Chunlin Leng
Can Xie
Jiajun Zhang
LLMAG
AI4CE
69
1
0
31 Mar 2025
Entropy-Based Adaptive Weighting for Self-Training
Entropy-Based Adaptive Weighting for Self-Training
Xiaoxuan Wang
Yihe Deng
Mingyu Derek Ma
Wei Wang
LRM
52
0
0
31 Mar 2025
Learning a Canonical Basis of Human Preferences from Binary Ratings
Learning a Canonical Basis of Human Preferences from Binary Ratings
Kailas Vodrahalli
Wei Wei
James Zou
49
0
0
31 Mar 2025
CONGRAD:Conflicting Gradient Filtering for Multilingual Preference Alignment
CONGRAD:Conflicting Gradient Filtering for Multilingual Preference Alignment
Jiangnan Li
Thuy-Trang Vu
Christian Herold
Amirhossein Tebbifakhr
Shahram Khadivi
Gholamreza Haffari
42
0
0
31 Mar 2025
DeepDubber-V1: Towards High Quality and Dialogue, Narration, Monologue Adaptive Movie Dubbing Via Multi-Modal Chain-of-Thoughts Reasoning Guidance
DeepDubber-V1: Towards High Quality and Dialogue, Narration, Monologue Adaptive Movie Dubbing Via Multi-Modal Chain-of-Thoughts Reasoning Guidance
Junjie Zheng
Zihao Chen
Chaofan Ding
Xinhan Di
VGen
75
1
0
31 Mar 2025
UltraRAG: A Modular and Automated Toolkit for Adaptive Retrieval-Augmented Generation
UltraRAG: A Modular and Automated Toolkit for Adaptive Retrieval-Augmented Generation
Yuxuan Chen
D. Guo
Sen Mei
Xinze Li
Hao Chen
...
Yukun Yan
Zhenghao Liu
S. Yu
Zhiyuan Liu
Maosong Sun
VLM
37
0
0
31 Mar 2025
CoRanking: Collaborative Ranking with Small and Large Ranking Agents
CoRanking: Collaborative Ranking with Small and Large Ranking Agents
Wenhan Liu
Xinyu Ma
Bo Li
Lixin Su
Shuaiqiang Wang
Dawei Yin
Zhicheng Dou
ALM
44
0
0
30 Mar 2025
FeRG-LLM : Feature Engineering by Reason Generation Large Language Models
FeRG-LLM : Feature Engineering by Reason Generation Large Language Models
Jeonghyun Ko
Gyeongyun Park
Donghoon Lee
Kyunam Lee
LRM
57
0
0
30 Mar 2025
Distill-C: Enhanced NL2SQL via Distilled Customization with LLMs
Distill-C: Enhanced NL2SQL via Distilled Customization with LLMs
Cong Duy Vu Hoang
Gioacchino Tangari
Clemence Lanfranchi
Dalu Guo
Paul Cayet
...
Long Duong
Damien Hilloulin
Rhicheek Patra
Sungpack Hong
Hassan Chafi
38
0
0
30 Mar 2025
A Framework for Lightweight Responsible Prompting Recommendation
A Framework for Lightweight Responsible Prompting Recommendation
Tiago Machado
Sara E. Berger
Cassia Sanctos
Vagner Figueiredo de Santana
Lemara Williams
Zhaoqing Wu
33
0
0
29 Mar 2025
SUV: Scalable Large Language Model Copyright Compliance with Regularized Selective Unlearning
SUV: Scalable Large Language Model Copyright Compliance with Regularized Selective Unlearning
Tianyang Xu
Xiaoze Liu
Feijie Wu
Xiaoqian Wang
Jing Gao
MU
66
0
0
29 Mar 2025
Enhance Generation Quality of Flow Matching V2A Model via Multi-Step CoT-Like Guidance and Combined Preference Optimization
Enhance Generation Quality of Flow Matching V2A Model via Multi-Step CoT-Like Guidance and Combined Preference Optimization
Haomin Zhang
Shri Kiran Srinivasan
Haoyu Wang
Zihao Chen
Xianglong Liu
Chaofan Ding
Xinhan Di
41
0
0
28 Mar 2025
EllieSQL: Cost-Efficient Text-to-SQL with Complexity-Aware Routing
EllieSQL: Cost-Efficient Text-to-SQL with Complexity-Aware Routing
Yizhang Zhu
Runzhi Jiang
Boyan Li
Nan Tang
Yuyu Luo
39
2
0
28 Mar 2025
ORIGEN: Zero-Shot 3D Orientation Grounding in Text-to-Image Generation
ORIGEN: Zero-Shot 3D Orientation Grounding in Text-to-Image Generation
Yunhong Min
Daehyeon Choi
Kyeongmin Yeo
Jihyun Lee
Minhyuk Sung
59
0
0
28 Mar 2025
RLDBF: Enhancing LLMs Via Reinforcement Learning With DataBase FeedBack
RLDBF: Enhancing LLMs Via Reinforcement Learning With DataBase FeedBack
Weichen Dai
Zijie Dai
Zhijie Huang
Yixuan Pan
Xinhe Li
Xi Li
Yi Zhou
Ji Qi
Wu Jiang
29
0
0
28 Mar 2025
Learning to Reason for Long-Form Story Generation
Learning to Reason for Long-Form Story Generation
Alexander Gurung
Mirella Lapata
ReLM
OffRL
LRM
68
1
0
28 Mar 2025
Sharpe Ratio-Guided Active Learning for Preference Optimization in RLHF
Sharpe Ratio-Guided Active Learning for Preference Optimization in RLHF
Syrine Belakaria
Joshua Kazdan
Charles Marx
Chris Cundy
Willie Neiswanger
Sanmi Koyejo
Barbara Engelhardt
Stefano Ermon
41
0
0
28 Mar 2025
DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundness
DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundness
Ruining Li
Chuanxia Zheng
Christian Rupprecht
Andrea Vedaldi
47
1
0
28 Mar 2025
Previous
123...789...515253
Next