ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.18290
  4. Cited By
Direct Preference Optimization: Your Language Model is Secretly a Reward
  Model

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

29 May 2023
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
    ALM
ArXivPDFHTML

Papers citing "Direct Preference Optimization: Your Language Model is Secretly a Reward Model"

50 / 2,637 papers shown
Title
Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral
Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral
Yiming Cui
Xin Yao
30
4
0
04 Mar 2024
Improving the Validity of Automatically Generated Feedback via
  Reinforcement Learning
Improving the Validity of Automatically Generated Feedback via Reinforcement Learning
Alexander Scarlatos
Digory Smith
Simon Woodhead
Andrew Lan
OffRL
58
12
0
02 Mar 2024
LLaMoCo: Instruction Tuning of Large Language Models for Optimization
  Code Generation
LLaMoCo: Instruction Tuning of Large Language Models for Optimization Code Generation
Zeyuan Ma
Hongshu Guo
Jiacheng Chen
Guojun Peng
Zhiguang Cao
Yining Ma
Yue-jiao Gong
SyDa
ALM
40
17
0
02 Mar 2024
LAB: Large-Scale Alignment for ChatBots
LAB: Large-Scale Alignment for ChatBots
Shivchander Sudalairaj
Abhishek Bhandwaldar
Aldo Pareja
Kai Xu
David D. Cox
Akash Srivastava
OSLM
41
29
0
02 Mar 2024
Peacock: A Family of Arabic Multimodal Large Language Models and
  Benchmarks
Peacock: A Family of Arabic Multimodal Large Language Models and Benchmarks
Fakhraddin Alwajih
El Moatez Billah Nagoudi
Gagan Bhatia
Abdelrahman Mohamed
Muhammad Abdul-Mageed
VLM
LRM
35
11
0
01 Mar 2024
Provably Robust DPO: Aligning Language Models with Noisy Feedback
Provably Robust DPO: Aligning Language Models with Noisy Feedback
Sayak Ray Chowdhury
Anush Kini
Nagarajan Natarajan
45
58
0
01 Mar 2024
Improving Socratic Question Generation using Data Augmentation and
  Preference Optimization
Improving Socratic Question Generation using Data Augmentation and Preference Optimization
Nischal Ashok Kumar
Andrew Lan
43
8
0
01 Mar 2024
Loose LIPS Sink Ships: Asking Questions in Battleship with
  Language-Informed Program Sampling
Loose LIPS Sink Ships: Asking Questions in Battleship with Language-Informed Program Sampling
Gabriel Grand
Valerio Pepe
Jacob Andreas
Joshua B. Tenenbaum
ReLM
39
6
0
29 Feb 2024
Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period
  of Large Language Models
Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models
Chao Qian
Jie Zhang
Wei Yao
Dongrui Liu
Zhen-fei Yin
Yu Qiao
Yong Liu
Jing Shao
LLMSV
LRM
57
13
0
29 Feb 2024
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
Yifei Zhou
Andrea Zanette
Jiayi Pan
Sergey Levine
Aviral Kumar
65
51
0
29 Feb 2024
PlanGPT: Enhancing Urban Planning with Tailored Language Model and
  Efficient Retrieval
PlanGPT: Enhancing Urban Planning with Tailored Language Model and Efficient Retrieval
He Zhu
Wenjia Zhang
Nuoxian Huang
Boyang Li
Luyao Niu
...
Yicheng Tao
Junyou Su
Zhaoya Gong
Chenyu Fang
Xing Liu
LLMAG
58
10
0
29 Feb 2024
Controllable Preference Optimization: Toward Controllable
  Multi-Objective Alignment
Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment
Yiju Guo
Ganqu Cui
Lifan Yuan
Ning Ding
Jiexin Wang
...
Ruobing Xie
Jie Zhou
Yankai Lin
Zhiyuan Liu
Maosong Sun
36
60
0
29 Feb 2024
Arithmetic Control of LLMs for Diverse User Preferences: Directional
  Preference Alignment with Multi-Objective Rewards
Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards
Haoxiang Wang
Yong Lin
Wei Xiong
Rui Yang
Shizhe Diao
Shuang Qiu
Han Zhao
Tong Zhang
45
72
0
28 Feb 2024
Clustering and Ranking: Diversity-preserved Instruction Selection
  through Expert-aligned Quality Estimation
Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation
Yuan Ge
Yilun Liu
Chi Hu
Weibin Meng
Shimin Tao
Xiaofeng Zhao
Hongxia Ma
Li Zhang
Hao Yang
Tong Xiao
ALM
42
27
0
28 Feb 2024
Small But Funny: A Feedback-Driven Approach to Humor Distillation
Small But Funny: A Feedback-Driven Approach to Humor Distillation
Sahithya Ravi
Patrick Huber
Akshat Shrivastava
Aditya Sagar
Ahmed Aly
Vered Shwartz
Arash Einolghozati
52
5
0
28 Feb 2024
Making Them Ask and Answer: Jailbreaking Large Language Models in Few
  Queries via Disguise and Reconstruction
Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise and Reconstruction
Tong Liu
Yingjie Zhang
Zhe Zhao
Yinpeng Dong
Guozhu Meng
Kai Chen
AAML
56
48
0
28 Feb 2024
AmbigNLG: Addressing Task Ambiguity in Instruction for NLG
AmbigNLG: Addressing Task Ambiguity in Instruction for NLG
Ayana Niwa
Hayate Iso
38
4
0
27 Feb 2024
Securing Reliability: A Brief Overview on Enhancing In-Context Learning
  for Foundation Models
Securing Reliability: A Brief Overview on Enhancing In-Context Learning for Foundation Models
Yunpeng Huang
Yaonan Gu
Jingwei Xu
Zhihong Zhu
Zhaorun Chen
Xiaoxing Ma
43
3
0
27 Feb 2024
SoFA: Shielded On-the-fly Alignment via Priority Rule Following
SoFA: Shielded On-the-fly Alignment via Priority Rule Following
Xinyu Lu
Bowen Yu
Yaojie Lu
Hongyu Lin
Haiyang Yu
Le Sun
Xianpei Han
Yongbin Li
78
13
0
27 Feb 2024
Speak Out of Turn: Safety Vulnerability of Large Language Models in
  Multi-turn Dialogue
Speak Out of Turn: Safety Vulnerability of Large Language Models in Multi-turn Dialogue
Zhenhong Zhou
Jiuyang Xiang
Haopeng Chen
Quan Liu
Zherui Li
Sen Su
42
20
0
27 Feb 2024
Video as the New Language for Real-World Decision Making
Video as the New Language for Real-World Decision Making
Sherry Yang
Jacob Walker
Jack Parker-Holder
Yilun Du
Jake Bruce
Andre Barreto
Pieter Abbeel
Dale Schuurmans
VGen
36
46
0
27 Feb 2024
Immunization against harmful fine-tuning attacks
Immunization against harmful fine-tuning attacks
Domenic Rosati
Jan Wehner
Kai Williams
Lukasz Bartoszcze
Jan Batzner
Hassan Sajjad
Frank Rudzicz
AAML
65
17
0
26 Feb 2024
Feedback Efficient Online Fine-Tuning of Diffusion Models
Feedback Efficient Online Fine-Tuning of Diffusion Models
Masatoshi Uehara
Yulai Zhao
Kevin Black
Ehsan Hajiramezanali
Gabriele Scalia
N. Diamant
Alex Tseng
Sergey Levine
Tommaso Biancalani
41
22
0
26 Feb 2024
Chain-of-Discussion: A Multi-Model Framework for Complex Evidence-Based
  Question Answering
Chain-of-Discussion: A Multi-Model Framework for Complex Evidence-Based Question Answering
Mingxu Tao
Dongyan Zhao
Yansong Feng
LLMAG
49
3
0
26 Feb 2024
Graph Diffusion Policy Optimization
Graph Diffusion Policy Optimization
Yijing Liu
Chao Du
Tianyu Pang
Chongxuan Li
Wei Chen
Min Lin
42
7
0
26 Feb 2024
Defending Large Language Models against Jailbreak Attacks via Semantic
  Smoothing
Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing
Jiabao Ji
Bairu Hou
Alexander Robey
George J. Pappas
Hamed Hassani
Yang Zhang
Eric Wong
Shiyu Chang
AAML
50
43
0
25 Feb 2024
Don't Forget Your Reward Values: Language Model Alignment via
  Value-based Calibration
Don't Forget Your Reward Values: Language Model Alignment via Value-based Calibration
Xin Mao
Fengming Li
Huimin Xu
Wei Zhang
Anh Tuan Luu
ALM
50
6
0
25 Feb 2024
GraphWiz: An Instruction-Following Language Model for Graph Problems
GraphWiz: An Instruction-Following Language Model for Graph Problems
Nuo Chen
Yuhan Li
Jianheng Tang
Jia Li
50
28
0
25 Feb 2024
PRP: Propagating Universal Perturbations to Attack Large Language Model
  Guard-Rails
PRP: Propagating Universal Perturbations to Attack Large Language Model Guard-Rails
Neal Mangaokar
Ashish Hooda
Jihye Choi
Shreyas Chandrashekaran
Kassem Fawaz
Somesh Jha
Atul Prakash
AAML
35
35
0
24 Feb 2024
Batch Active Learning of Reward Functions from Human Preferences
Batch Active Learning of Reward Functions from Human Preferences
Erdem Biyik
Nima Anari
Dorsa Sadigh
42
8
0
24 Feb 2024
Fast Adversarial Attacks on Language Models In One GPU Minute
Fast Adversarial Attacks on Language Models In One GPU Minute
Vinu Sankar Sadasivan
Shoumik Saha
Gaurang Sriramanan
Priyatham Kattakinda
Atoosa Malemir Chegini
S. Feizi
MIALM
45
34
0
23 Feb 2024
AgentOhana: Design Unified Data and Training Pipeline for Effective
  Agent Learning
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning
Jianguo Zhang
Tian Lan
Rithesh Murthy
Zhiwei Liu
Weiran Yao
...
Juan Carlos Niebles
Silvio Savarese
Shelby Heinecke
Huan Wang
Caiming Xiong
LLMAG
64
34
0
23 Feb 2024
Co-Supervised Learning: Improving Weak-to-Strong Generalization with
  Hierarchical Mixture of Experts
Co-Supervised Learning: Improving Weak-to-Strong Generalization with Hierarchical Mixture of Experts
Yuejiang Liu
Alexandre Alahi
39
18
0
23 Feb 2024
Leveraging Domain Knowledge for Efficient Reward Modelling in RLHF: A
  Case-Study in E-Commerce Opinion Summarization
Leveraging Domain Knowledge for Efficient Reward Modelling in RLHF: A Case-Study in E-Commerce Opinion Summarization
Swaroop Nath
Tejpalsingh Siledar
Sankara Sri Raghava Ravindra Muddu
Rupasai Rangaraju
H. Khadilkar
...
Suman Banerjee
Amey Patil
Sudhanshu Singh
M. Chelliah
Nikesh Garera
50
0
0
23 Feb 2024
Brain-Inspired Two-Stage Approach: Enhancing Mathematical Reasoning by
  Imitating Human Thought Processes
Brain-Inspired Two-Stage Approach: Enhancing Mathematical Reasoning by Imitating Human Thought Processes
Yezeng Chen
Zui Chen
Yi Zhou
LRM
40
2
0
23 Feb 2024
Break the Breakout: Reinventing LM Defense Against Jailbreak Attacks
  with Self-Refinement
Break the Breakout: Reinventing LM Defense Against Jailbreak Attacks with Self-Refinement
Heegyu Kim
Sehyun Yuk
Hyunsouk Cho
AAML
44
16
0
23 Feb 2024
Unintended Impacts of LLM Alignment on Global Representation
Unintended Impacts of LLM Alignment on Global Representation
Michael Joseph Ryan
William B. Held
Diyi Yang
50
41
0
22 Feb 2024
Divide-or-Conquer? Which Part Should You Distill Your LLM?
Divide-or-Conquer? Which Part Should You Distill Your LLM?
Zhuofeng Wu
Richard He Bai
Aonan Zhang
Jiatao Gu
V. Vydiswaran
Navdeep Jaitly
Yizhe Zhang
LRM
40
7
0
22 Feb 2024
Optimizing Language Models for Human Preferences is a Causal Inference
  Problem
Optimizing Language Models for Human Preferences is a Causal Inference Problem
Victoria Lin
Eli Ben-Michael
Louis-Philippe Morency
43
3
0
22 Feb 2024
MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language
  Models in Multi-Turn Dialogues
MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues
Ge Bai
Jie Liu
Xingyuan Bu
Yancheng He
Jiaheng Liu
...
Zhuoran Lin
Wenbo Su
Tiezheng Ge
Bo Zheng
Wanli Ouyang
ELM
LM&MA
45
73
0
22 Feb 2024
Generalizing Reward Modeling for Out-of-Distribution Preference Learning
Generalizing Reward Modeling for Out-of-Distribution Preference Learning
Chen Jia
44
2
0
22 Feb 2024
Chain-of-Thought Unfaithfulness as Disguised Accuracy
Chain-of-Thought Unfaithfulness as Disguised Accuracy
Oliver Bentham
Nathan Stringham
Ana Marasović
LRM
HILM
50
8
0
22 Feb 2024
Efficient and Effective Vocabulary Expansion Towards Multilingual Large
  Language Models
Efficient and Effective Vocabulary Expansion Towards Multilingual Large Language Models
Seungduk Kim
Seungtaek Choi
Myeongho Jeong
46
6
0
22 Feb 2024
Q-Probe: A Lightweight Approach to Reward Maximization for Language
  Models
Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
Kenneth Li
Samy Jelassi
Hugh Zhang
Sham Kakade
Martin Wattenberg
David Brandfonbrener
35
9
0
22 Feb 2024
Do LLMs Implicitly Determine the Suitable Text Difficulty for Users?
Do LLMs Implicitly Determine the Suitable Text Difficulty for Users?
Seiji Gobara
Hidetaka Kamigaito
Taro Watanabe
40
4
0
22 Feb 2024
MENTOR: Guiding Hierarchical Reinforcement Learning with Human Feedback
  and Dynamic Distance Constraint
MENTOR: Guiding Hierarchical Reinforcement Learning with Human Feedback and Dynamic Distance Constraint
Xinglin Zhou
Yifu Yuan
Shaofu Yang
Jianye Hao
39
1
0
22 Feb 2024
COPR: Continual Human Preference Learning via Optimal Policy
  Regularization
COPR: Continual Human Preference Learning via Optimal Policy Regularization
Han Zhang
Lin Gui
Yu Lei
Yuanzhao Zhai
Yehong Zhang
...
Hui Wang
Yue Yu
Kam-Fai Wong
Bin Liang
Ruifeng Xu
CLL
42
4
0
22 Feb 2024
Making Reasoning Matter: Measuring and Improving Faithfulness of
  Chain-of-Thought Reasoning
Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning
Debjit Paul
Robert West
Antoine Bosselut
Boi Faltings
ReLM
LRM
51
21
0
21 Feb 2024
SYNFAC-EDIT: Synthetic Imitation Edit Feedback for Factual Alignment in
  Clinical Summarization
SYNFAC-EDIT: Synthetic Imitation Edit Feedback for Factual Alignment in Clinical Summarization
Prakamya Mishra
Zonghai Yao
Parth Vashisht
Feiyun Ouyang
Beining Wang
Vidhi Mody
Hong-ye Yu
SyDa
MedIm
49
4
0
21 Feb 2024
Large Language Models for Data Annotation: A Survey
Large Language Models for Data Annotation: A Survey
Zhen Tan
Dawei Li
Song Wang
Alimohammad Beigi
Bohan Jiang
Amrita Bhattacharjee
Mansooreh Karami
Wenlin Yao
Lu Cheng
Huan Liu
SyDa
56
53
0
21 Feb 2024
Previous
123...444546...515253
Next