Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.18290
Cited By
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
29 May 2023
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Direct Preference Optimization: Your Language Model is Secretly a Reward Model"
50 / 2,637 papers shown
Title
Understanding the Logic of Direct Preference Alignment through Logic
Kyle Richardson
Vivek Srikumar
Ashish Sabharwal
90
2
0
23 Dec 2024
Cannot or Should Not? Automatic Analysis of Refusal Composition in IFT/RLHF Datasets and Refusal Behavior of Black-Box LLMs
Alexander von Recum
Christoph Schnabl
Gabor Hollbeck
Silas Alberti
Philip Blinde
Marvin von Hagen
94
2
0
22 Dec 2024
Online Learning from Strategic Human Feedback in LLM Fine-Tuning
Shugang Hao
Lingjie Duan
97
3
0
22 Dec 2024
When Can Proxies Improve the Sample Complexity of Preference Learning?
Yuchen Zhu
Daniel Augusto de Souza
Zhengyan Shi
Mengyue Yang
Pasquale Minervini
Alexander DÁmour
Matt J. Kusner
90
0
0
21 Dec 2024
Large Language Model Can Be a Foundation for Hidden Rationale-Based Retrieval
Luo Ji
Feixiang Guo
Teng Chen
Qingqing Gu
Xiaoyu Wang
...
Peng Yu
Yue Zhao
Hongyang Lei
Zhonglin Jiang
Yong Chen
RALM
LRM
102
0
0
21 Dec 2024
JailPO: A Novel Black-box Jailbreak Framework via Preference Optimization against Aligned LLMs
Haoyang Li
Jiawei Ye
Jie Wu
Tianjie Yan
Chu Wang
Zhixin Li
AAML
85
0
0
20 Dec 2024
REFA: Reference Free Alignment for multi-preference optimization
Taneesh Gupta
Rahul Madhavan
Xuchao Zhang
Chetan Bansal
Saravan Rajmohan
99
1
0
20 Dec 2024
Northeastern Uni at Multilingual Counterspeech Generation: Enhancing Counter Speech Generation with LLM Alignment through Direct Preference Optimization
Sahil Wadhwa
Chengtian Xu
Haoming Chen
Aakash Mahalingam
Akankshya Kar
Divya Chaudhary
83
0
0
19 Dec 2024
Think&Cite: Improving Attributed Text Generation with Self-Guided Tree Search and Progress Reward Modeling
Junyi Li
Hwee Tou Ng
LRM
105
1
0
19 Dec 2024
Learning to Generate Research Idea with Dynamic Control
Ruochen Li
Liqiang Jing
Chi Han
Jiawei Zhou
Xinya Du
LRM
87
3
0
19 Dec 2024
PA-RAG: RAG Alignment via Multi-Perspective Preference Optimization
Jiayi Wu
Hengyi Cai
Lingyong Yan
Hao Sun
Xiang Li
Shuaiqiang Wang
Dawei Yin
Ming Gao
132
0
0
19 Dec 2024
SATA: A Paradigm for LLM Jailbreak via Simple Assistive Task Linkage
Xiaoning Dong
Wenbo Hu
Wei Xu
Tianxing He
85
0
0
19 Dec 2024
VideoDPO: Omni-Preference Alignment for Video Diffusion Generation
Runtao Liu
Haoyu Wu
Zheng Ziqiang
Chen Wei
Yingqing He
Renjie Pi
Qifeng Chen
VGen
90
14
0
18 Dec 2024
Hansel: Output Length Controlling Framework for Large Language Models
Seoha Song
Junhyun Lee
Hyeonmok Ko
80
0
0
18 Dec 2024
Energy-Based Preference Model Offers Better Offline Alignment than the Bradley-Terry Preference Model
Yuzhong Hong
Hanshan Zhang
Junwei Bao
Hongfei Jiang
Yang Song
OffRL
85
2
0
18 Dec 2024
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment
Zhuoran Jin
Hongbang Yuan
Tianyi Men
Pengfei Cao
Yubo Chen
Kang Liu
Jun Zhao
ALM
99
7
0
18 Dec 2024
Context-DPO: Aligning Language Models for Context-Faithfulness
Baolong Bi
Shaohan Huang
Yansen Wang
Tianchi Yang
Zihan Zhang
...
Furu Wei
Weiwei Deng
Feng Sun
Qi Zhang
Shenghua Liu
116
11
0
18 Dec 2024
Gradual Vigilance and Interval Communication: Enhancing Value Alignment in Multi-Agent Debates
Rui Zou
Mengqi Wei
Jintian Feng
Qian Wan
Jianwen Sun
Sannyuya Liu
82
0
0
18 Dec 2024
Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection
Le Yang
Ziwei Zheng
Boxu Chen
Zhengyu Zhao
Chenhao Lin
Chao Shen
VLM
148
3
0
18 Dec 2024
An Automated Explainable Educational Assessment System Built on LLMs
Jiazheng Li
Artem Bobrov
David West
Cesare Aloisi
Yulan He
90
2
0
17 Dec 2024
Preference-Oriented Supervised Fine-Tuning: Favoring Target Model Over Aligned Large Language Models
Yuchen Fan
Yuzhong Hong
Qiushi Wang
Junwei Bao
Hongfei Jiang
Yang Song
88
1
0
17 Dec 2024
NLSR: Neuron-Level Safety Realignment of Large Language Models Against Harmful Fine-Tuning
Xin Yi
Shunfan Zheng
Linlin Wang
Gerard de Melo
Xiaoling Wang
Liang He
98
7
0
17 Dec 2024
DateLogicQA: Benchmarking Temporal Biases in Large Language Models
Gagan Bhatia
MingZe Tang
Cristina Mahanta
Madiha Kazi
84
0
0
17 Dec 2024
Context Filtering with Reward Modeling in Question Answering
Sangryul Kim
James Thorne
78
0
0
16 Dec 2024
Self-Adaptive Paraphrasing and Preference Learning for Improved Claim Verifiability
Amelie Wuhrl
Roman Klinger
90
0
0
16 Dec 2024
ACE-
M
3
M^3
M
3
: Automatic Capability Evaluator for Multimodal Medical Models
Xiechi Zhang
Shunfan Zheng
Linlin Wang
Gerard de Melo
Zhu Cao
Xiaoling Wang
Liang He
ELM
133
0
0
16 Dec 2024
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
Jiale Cheng
Xiao-Chang Liu
C. Wang
Xiaotao Gu
Yaojie Lu
Dan Zhang
Yuxiao Dong
J. Tang
Hongning Wang
Minlie Huang
LRM
134
3
0
16 Dec 2024
UAlign: Leveraging Uncertainty Estimations for Factuality Alignment on Large Language Models
Boyang Xue
Fei Mi
Qi Zhu
Hongru Wang
Rui Wang
Sheng Wang
Erxin Yu
Xuming Hu
Kam-Fai Wong
HILM
95
1
0
16 Dec 2024
The Superalignment of Superhuman Intelligence with Large Language Models
Minlie Huang
Yingkang Wang
Shiyao Cui
Pei Ke
J. Tang
129
1
0
15 Dec 2024
Safe Reinforcement Learning using Finite-Horizon Gradient-based Estimation
Juntao Dai
Yaodong Yang
Qian Zheng
Gang Pan
OffRL
91
2
0
15 Dec 2024
Dual Traits in Probabilistic Reasoning of Large Language Models
Shenxiong Li
Huaxia Rui
85
0
0
15 Dec 2024
Hybrid Preference Optimization for Alignment: Provably Faster Convergence Rates by Combining Offline Preferences with Online Exploration
Avinandan Bose
Zhihan Xiong
Aadirupa Saha
S. Du
Maryam Fazel
86
1
0
13 Dec 2024
WHAT-IF: Exploring Branching Narratives by Meta-Prompting Large Language Models
Runsheng Huang
Lara J. Martin
Chris Callison-Burch
LRM
AI4CE
LLMAG
79
0
0
13 Dec 2024
SafetyDPO: Scalable Safety Alignment for Text-to-Image Generation
Runtao Liu
Chen I Chieh
Jindong Gu
Jipeng Zhang
Renjie Pi
Qifeng Chen
Philip Torr
Ashkan Khakzar
Fabio Pizzati
EGVM
114
0
0
13 Dec 2024
MPPO: Multi Pair-wise Preference Optimization for LLMs with Arbitrary Negative Samples
Shuo Xie
Fangzhi Zhu
Jiahui Wang
Lulu Wen
Wei Dai
Xiaowei Chen
Junxiong Zhu
Kai Zhou
Bo Zheng
79
0
0
13 Dec 2024
ROUTE: Robust Multitask Tuning and Collaboration for Text-to-SQL
Yang Qin
Chao Chen
Z. Fu
Ze Chen
Dezhong Peng
Peng Hu
Jieping Ye
110
3
0
13 Dec 2024
DECOR:Decomposition and Projection of Text Embeddings for Text-to-Image Customization
Geonhui Jang
Jin-Hwa Kim
Yong-Hyun Park
Junho Kim
Gayoung Lee
Yonghyun Jeong
DiffM
100
0
0
12 Dec 2024
Phi-4 Technical Report
Marah Abdin
J. Aneja
Harkirat Singh Behl
Sébastien Bubeck
Ronen Eldan
...
Rachel A. Ward
Yue Wu
Dingli Yu
Cyril Zhang
Yi Zhang
ALM
SyDa
121
98
0
12 Dec 2024
Test-Time Alignment via Hypothesis Reweighting
Yoonho Lee
Jonathan Williams
Henrik Marklund
Archit Sharma
E. Mitchell
Anikait Singh
Chelsea Finn
101
4
0
11 Dec 2024
Learning to Reason via Self-Iterative Process Feedback for Small Language Models
Kaiyuan Chen
Jin Wang
Xuejie Zhang
LRM
ReLM
90
2
0
11 Dec 2024
PrisonBreak: Jailbreaking Large Language Models with Fewer Than Twenty-Five Targeted Bit-flips
Zachary Coalson
Jeonghyun Woo
Shiyang Chen
Yu Sun
Lishan Yang
Prashant J. Nair
Bo Fang
Sanghyun Hong
AAML
92
2
0
10 Dec 2024
Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets
Zhen Liu
Tim Z. Xiao
Weiyang Liu
Yoshua Bengio
Dinghuai Zhang
123
4
0
10 Dec 2024
MMedPO: Aligning Medical Vision-Language Models with Clinical-Aware Multimodal Preference Optimization
Kangyu Zhu
Peng Xia
Yun Li
Hongtu Zhu
Sheng Wang
Huaxiu Yao
111
1
0
09 Dec 2024
Reinforcement Learning Enhanced LLMs: A Survey
Shuhe Wang
Shengyu Zhang
Jingyang Zhang
Runyi Hu
Xiaoya Li
Tianwei Zhang
Jiwei Li
Fei Wu
G. Wang
Eduard H. Hovy
OffRL
136
7
0
05 Dec 2024
Time-Reversal Provides Unsupervised Feedback to LLMs
Yerram Varun
Rahul Madhavan
Sravanti Addepalli
A. Suggala
Karthikeyan Shanmugam
Prateek Jain
LRM
SyDa
79
0
0
03 Dec 2024
Progress-Aware Video Frame Captioning
Zihui Xue
Joungbin An
Xitong Yang
Kristen Grauman
117
1
0
03 Dec 2024
PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos
Meng Cao
Haoran Tang
Haoze Zhao
Hangyu Guo
Jing Liu
Ge Zhang
Ruyang Liu
Qiang Sun
Ian Reid
Xiaodan Liang
115
2
0
02 Dec 2024
Harnessing Preference Optimisation in Protein LMs for Hit Maturation in Cell Therapy
Katarzyna Janocha
Annabel Ling
Alice Godson
Yulia Lampi
Simon Bornschein
Nils Y. Hammerla
84
2
0
02 Dec 2024
Yi-Lightning Technical Report
01. AI
:
Alan Wake
Albert Wang
Bei Chen
...
Yuxuan Sha
Zhaodong Yan
Zhiyuan Liu
Zirui Zhang
Zonghong Dai
OSLM
102
3
0
02 Dec 2024
Towards Adaptive Mechanism Activation in Language Agent
Ziyang Huang
Jun Zhao
Kang Liu
LLMAG
AI4CE
85
0
0
01 Dec 2024
Previous
1
2
3
...
18
19
20
...
51
52
53
Next