ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.18290
  4. Cited By
Direct Preference Optimization: Your Language Model is Secretly a Reward
  Model

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

29 May 2023
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
    ALM
ArXivPDFHTML

Papers citing "Direct Preference Optimization: Your Language Model is Secretly a Reward Model"

50 / 2,611 papers shown
Title
AAPO: Enhance the Reasoning Capabilities of LLMs with Advantage Momentum
AAPO: Enhance the Reasoning Capabilities of LLMs with Advantage Momentum
Jian Xiong
Jingbo Zhou
Jingyong Ye
Dejing Dou
LRM
10
0
0
20 May 2025
ThinkSwitcher: When to Think Hard, When to Think Fast
ThinkSwitcher: When to Think Hard, When to Think Fast
Guosheng Liang
Longguang Zhong
Ziyi Yang
Xiaojun Quan
LRM
2
0
0
20 May 2025
SAFEPATH: Preventing Harmful Reasoning in Chain-of-Thought via Early Alignment
SAFEPATH: Preventing Harmful Reasoning in Chain-of-Thought via Early Alignment
Wonje Jeung
Sangyeon Yoon
Minsuk Kahng
Albert No
LRM
LLMSV
7
0
0
20 May 2025
NExT-Search: Rebuilding User Feedback Ecosystem for Generative AI Search
NExT-Search: Rebuilding User Feedback Ecosystem for Generative AI Search
Sunhao Dai
Wenjie Wang
Liang Pang
Jun Xu
See-Kiong Ng
Ji-Rong Wen
Tat-Seng Chua
7
0
0
20 May 2025
sudoLLM : On Multi-role Alignment of Language Models
sudoLLM : On Multi-role Alignment of Language Models
Soumadeep Saha
Akshay Chaturvedi
Joy Mahapatra
Utpal Garain
2
0
0
20 May 2025
Breaking Bad Tokens: Detoxification of LLMs Using Sparse Autoencoders
Breaking Bad Tokens: Detoxification of LLMs Using Sparse Autoencoders
Agam Goyal
Vedant Rathi
William Yeh
Yian Wang
Yuen Chen
Hari Sundaram
2
0
0
20 May 2025
Think Only When You Need with Large Hybrid-Reasoning Models
Think Only When You Need with Large Hybrid-Reasoning Models
Lingjie Jiang
Xun Wu
Shaohan Huang
Qingxiu Dong
Zewen Chi
Li Dong
Xingxing Zhang
Tengchao Lv
Lei Cui
Furu Wei
OffRL
LRM
7
0
0
20 May 2025
Safety Subspaces are Not Distinct: A Fine-Tuning Case Study
Safety Subspaces are Not Distinct: A Fine-Tuning Case Study
Kaustubh Ponkshe
Shaan Shah
Raghav Singhal
Praneeth Vepakomma
2
0
0
20 May 2025
Plane Geometry Problem Solving with Multi-modal Reasoning: A Survey
Plane Geometry Problem Solving with Multi-modal Reasoning: A Survey
Seunghyuk Cho
Zhenyue Qin
Yang Liu
Youngbin Choi
Seungbeom Lee
Dongwoo Kim
LRM
2
0
0
20 May 2025
Unearthing Gems from Stones: Policy Optimization with Negative Sample Augmentation for LLM Reasoning
Unearthing Gems from Stones: Policy Optimization with Negative Sample Augmentation for LLM Reasoning
Zhaohui Yang
Shilei Jiang
Chen Hu
Linjing Li
Shihong Deng
D. Jiang
OffRL
15
0
0
20 May 2025
Towards eliciting latent knowledge from LLMs with mechanistic interpretability
Towards eliciting latent knowledge from LLMs with mechanistic interpretability
Bartosz Cywiński
Emil Ryd
Senthooran Rajamanoharan
Neel Nanda
2
0
0
20 May 2025
Preference Learning with Lie Detectors can Induce Honesty or Evasion
Preference Learning with Lie Detectors can Induce Honesty or Evasion
Chris Cundy
Adam Gleave
2
0
0
20 May 2025
Toward Effective Reinforcement Learning Fine-Tuning for Medical VQA in Vision-Language Models
Toward Effective Reinforcement Learning Fine-Tuning for Medical VQA in Vision-Language Models
Wenhui Zhu
Xuanzhao Dong
Xin Li
Peijie Qiu
Xiwen Chen
Abolfazl Razi
Aris Sotiras
Yi Su
Yalin Wang
OffRL
LM&MA
14
0
0
20 May 2025
Investigating and Enhancing the Robustness of Large Multimodal Models Against Temporal Inconsistency
Investigating and Enhancing the Robustness of Large Multimodal Models Against Temporal Inconsistency
Jiafeng Liang
Shixin Jiang
Xuan Dong
Ning Wang
Zheng Chu
Hui Su
Jinlan Fu
Ming-Yu Liu
See-Kiong Ng
Bing Qin
2
0
0
20 May 2025
Think-J: Learning to Think for Generative LLM-as-a-Judge
Think-J: Learning to Think for Generative LLM-as-a-Judge
Hui Huang
Yancheng He
Hongli Zhou
Rui Zhang
Wei Liu
Weixun Wang
Wenbo Su
Bo Zheng
Jiaheng Liu
LLMAG
AILaw
ELM
LRM
4
0
0
20 May 2025
InfiFPO: Implicit Model Fusion via Preference Optimization in Large Language Models
InfiFPO: Implicit Model Fusion via Preference Optimization in Large Language Models
Yanggan Gu
Zhaoyi Yan
Yuanyi Wang
Yiming Zhang
Qi Zhou
Fei Wu
Hongxia Yang
2
0
0
20 May 2025
Cross-Lingual Optimization for Language Transfer in Large Language Models
Cross-Lingual Optimization for Language Transfer in Large Language Models
Jungseob Lee
Seongtae Hong
Hyeonseok Moon
Heuiseok Lim
9
0
0
20 May 2025
Language Models Optimized to Fool Detectors Still Have a Distinct Style (And How to Change It)
Language Models Optimized to Fool Detectors Still Have a Distinct Style (And How to Change It)
Rafael A. Rivera Soto
Barry Chen
Nicholas Andrews
7
0
0
20 May 2025
YESciEval: Robust LLM-as-a-Judge for Scientific Question Answering
YESciEval: Robust LLM-as-a-Judge for Scientific Question Answering
Jennifer D'Souza
Hamed Babaei Giglou
Quentin Münch
ELM
7
0
0
20 May 2025
R3: Robust Rubric-Agnostic Reward Models
R3: Robust Rubric-Agnostic Reward Models
David Anugraha
Zilu Tang
Lester James V. Miranda
Hanyang Zhao
Mohammad Rifqi Farhansyah
Garry Kuwanto
Derry Wijaya
Genta Indra Winata
9
0
0
19 May 2025
Understanding Complexity in VideoQA via Visual Program Generation
Understanding Complexity in VideoQA via Visual Program Generation
Cristobal Eyzaguirre
Igor Vasiljevic
Achal Dave
Jiajun Wu
Rares Andrei Ambrus
Thomas Kollar
Juan Carlos Niebles
P. Tokmakov
7
0
0
19 May 2025
R1dacted: Investigating Local Censorship in DeepSeek's R1 Language Model
R1dacted: Investigating Local Censorship in DeepSeek's R1 Language Model
Ali Naseh
Harsh Chaudhari
Jaechul Roh
Mingshi Wu
Alina Oprea
Amir Houmansadr
AAML
ELM
12
0
0
19 May 2025
ReEx-SQL: Reasoning with Execution-Aware Reinforcement Learning for Text-to-SQL
ReEx-SQL: Reasoning with Execution-Aware Reinforcement Learning for Text-to-SQL
Yaxun Dai
Wenxuan Xie
Xialie Zhuang
Tianyu Yang
Yiying Yang
Haiqin Yang
Yuhang Zhao
Pingfu Chao
Wenhao Jiang
ReLM
LRM
27
0
0
19 May 2025
ProDS: Preference-oriented Data Selection for Instruction Tuning
ProDS: Preference-oriented Data Selection for Instruction Tuning
Wenya Guo
Zhengkun Zhang
Xumeng Liu
Ying Zhang
Ziyu Lu
Haoze Zhu
Xubo Liu
Ruxue Yan
7
0
0
19 May 2025
Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space
Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space
Hengli Li
Chenxi Li
Tong Wu
Xuekai Zhu
Yuxuan Wang
...
Eric Hanchen Jiang
Song-Chun Zhu
Zixia Jia
Ying Nian Wu
Zilong Zheng
LRM
7
0
0
19 May 2025
Krikri: Advancing Open Large Language Models for Greek
Krikri: Advancing Open Large Language Models for Greek
Dimitris Roussis
Leon Voukoutis
Georgios Paraskevopoulos
Sokratis Sofianopoulos
Prokopis Prokopidis
Vassilis Papavasileiou
Athanasios Katsamanis
Stelios Piperidis
V. Katsouros
ALM
20
0
0
19 May 2025
AdaptThink: Reasoning Models Can Learn When to Think
AdaptThink: Reasoning Models Can Learn When to Think
J. Zhang
Nianyi Lin
Lei Hou
Ling Feng
Juanzi Li
OffRL
LRM
2
0
0
19 May 2025
CoIn: Counting the Invisible Reasoning Tokens in Commercial Opaque LLM APIs
CoIn: Counting the Invisible Reasoning Tokens in Commercial Opaque LLM APIs
Guoheng Sun
Ziyao Wang
Bowei Tian
Meng Liu
Zheyu Shen
Shwai He
Yexiao He
Wanghao Ye
Yiting Wang
Ang Li
LRM
2
0
0
19 May 2025
Incentivizing Truthful Language Models via Peer Elicitation Games
Incentivizing Truthful Language Models via Peer Elicitation Games
Baiting Chen
Tong Zhu
Jiale Han
Lexin Li
Gang Li
Xiaowu Dai
2
0
0
19 May 2025
Shadow-FT: Tuning Instruct via Base
Shadow-FT: Tuning Instruct via Base
Taiqiang Wu
Runming Yang
Jiayi Li
Pengfei Hu
Ngai Wong
Yujiu Yang
12
0
0
19 May 2025
Adversarial Testing in LLMs: Insights into Decision-Making Vulnerabilities
Adversarial Testing in LLMs: Insights into Decision-Making Vulnerabilities
Lili Zhang
Haomiaomiao Wang
Long Cheng
Libao Deng
Tomas E. Ward
AAML
2
0
0
19 May 2025
On-Policy Optimization with Group Equivalent Preference for Multi-Programming Language Understanding
On-Policy Optimization with Group Equivalent Preference for Multi-Programming Language Understanding
Haoyuan Wu
Rui Ming
Jilong Gao
Hangyu Zhao
Xueyi Chen
Yikai Yang
Haisheng Zheng
Zhuolun He
Bei Yu
16
0
0
19 May 2025
WikiPersonas: What Can We Learn From Personalized Alignment to Famous People?
WikiPersonas: What Can We Learn From Personalized Alignment to Famous People?
Zilu Tang
Afra Feyza Akyürek
Ekin Akyürek
Derry Wijaya
7
0
0
19 May 2025
Walking the Tightrope: Disentangling Beneficial and Detrimental Drifts in Non-Stationary Custom-Tuning
Walking the Tightrope: Disentangling Beneficial and Detrimental Drifts in Non-Stationary Custom-Tuning
Xiaoyu Yang
Jie Lu
En Yu
2
0
0
19 May 2025
Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards
Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards
Xiaoyuan Liu
Tian Liang
Zhiwei He
Jiahao Xu
Wenxuan Wang
Pinjia He
Zhaopeng Tu
Haitao Mi
Dong Yu
OffRL
ReLM
LRM
9
0
0
19 May 2025
Investigating the Vulnerability of LLM-as-a-Judge Architectures to Prompt-Injection Attacks
Investigating the Vulnerability of LLM-as-a-Judge Architectures to Prompt-Injection Attacks
Narek Maloyan
Bislan Ashinov
Dmitry Namiot
AAML
ELM
9
0
0
19 May 2025
Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning
Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning
Xinbin Yuan
Jian Zhang
K. Li
Zhuoxuan Cai
Lujian Yao
...
Enguang Wang
Qibin Hou
Jinwei Chen
Peng-Tao Jiang
Bo Li
7
0
0
18 May 2025
LAMeTA: Intent-Aware Agentic Network Optimization via a Large AI Model-Empowered Two-Stage Approach
LAMeTA: Intent-Aware Agentic Network Optimization via a Large AI Model-Empowered Two-Stage Approach
Yinqiu Liu
Guangyuan Liu
Jiacheng Wang
Ruichen Zhang
Dusit Niyato
Geng Sun
Zehui Xiong
Zhu Han
4
0
0
18 May 2025
Enriching Patent Claim Generation with European Patent Dataset
Enriching Patent Claim Generation with European Patent Dataset
Lekang Jiang
Chengzu Li
Stephan Goetz
7
0
0
18 May 2025
SGDPO: Self-Guided Direct Preference Optimization for Language Model Alignment
SGDPO: Self-Guided Direct Preference Optimization for Language Model Alignment
Wenqiao Zhu
Ji Liu
Lulu Wang
Jun Wu
Yulun Zhang
9
0
0
18 May 2025
SPIRIT: Patching Speech Language Models against Jailbreak Attacks
SPIRIT: Patching Speech Language Models against Jailbreak Attacks
Amirbek Djanibekov
Nurdaulet Mukhituly
Kentaro Inui
Hanan Aldarmaki
Nils Lukas
AAML
2
0
0
18 May 2025
Reward Inside the Model: A Lightweight Hidden-State Reward Model for LLM's Best-of-N sampling
Reward Inside the Model: A Lightweight Hidden-State Reward Model for LLM's Best-of-N sampling
Jizhou Guo
Zhaomin Wu
Philip S. Yu
4
0
0
18 May 2025
DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization
DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization
Gang Li
Ming Lin
Tomer Galanti
Zhengzhong Tu
Tianbao Yang
9
0
0
18 May 2025
AbFlowNet: Optimizing Antibody-Antigen Binding Energy via Diffusion-GFlowNet Fusion
AbFlowNet: Optimizing Antibody-Antigen Binding Energy via Diffusion-GFlowNet Fusion
Abrar Rahman Abir
Haz Sameen Shahgir
Md Rownok Zahan Ratul
Md Toki Tahmid
Greg Ver Steeg
Yue Dong
2
0
0
18 May 2025
MARGE: Improving Math Reasoning for LLMs with Guided Exploration
MARGE: Improving Math Reasoning for LLMs with Guided Exploration
Jingyue Gao
Runji Lin
Keming Lu
Bowen Yu
Junyang Lin
Jianyu Chen
LRM
9
0
0
18 May 2025
RLAP: A Reinforcement Learning Enhanced Adaptive Planning Framework for Multi-step NLP Task Solving
RLAP: A Reinforcement Learning Enhanced Adaptive Planning Framework for Multi-step NLP Task Solving
Zepeng Ding
Dixuan Wang
Ziqin Luo
Guochao Jiang
Deqing Yang
Jiaqing Liang
2
0
0
17 May 2025
Fair-PP: A Synthetic Dataset for Aligning LLM with Personalized Preferences of Social Equity
Fair-PP: A Synthetic Dataset for Aligning LLM with Personalized Preferences of Social Equity
Qi Zhou
Jie Zhang
Dongxia Wang
Qiang Liu
Tianlin Li
Jin Song Dong
Wenhai Wang
Qing Guo
SyDa
4
0
0
17 May 2025
VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning
VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning
Yuqi Liu
Tianyuan Qu
Zhisheng Zhong
Bohao Peng
Shu Liu
Bei Yu
Jiaya Jia
VLM
LRM
25
0
0
17 May 2025
Telco-oRAG: Optimizing Retrieval-augmented Generation for Telecom Queries via Hybrid Retrieval and Neural Routing
Telco-oRAG: Optimizing Retrieval-augmented Generation for Telecom Queries via Hybrid Retrieval and Neural Routing
Andrei-Laurentiu Bornea
Fadhel Ayed
Antonio De Domenico
Nicola Piovesan
Tareq Si Salem
Ali Maatouk
12
0
0
17 May 2025
SafeVid: Toward Safety Aligned Video Large Multimodal Models
SafeVid: Toward Safety Aligned Video Large Multimodal Models
Yixu Wang
Jiaxin Song
Yifeng Gao
Xin Wang
Yang Yao
Yan Teng
Xingjun Ma
Yingchun Wang
Yu-Gang Jiang
7
0
0
17 May 2025
1234...515253
Next