Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.18290
Cited By
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
29 May 2023
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Direct Preference Optimization: Your Language Model is Secretly a Reward Model"
50 / 2,637 papers shown
Title
HumanRankEval: Automatic Evaluation of LMs as Conversational Assistants
Milan Gritta
Gerasimos Lampouras
Ignacio Iacobacci
ALM
37
1
0
15 May 2024
A safety realignment framework via subspace-oriented model fusion for large language models
Xin Yi
Shunfan Zheng
Linlin Wang
Xiaoling Wang
Liang He
65
23
0
15 May 2024
ALMol: Aligned Language-Molecule Translation LLMs through Offline Preference Contrastive Optimisation
Dimitris Gkoumas
39
0
0
14 May 2024
Risks and Opportunities of Open-Source Generative AI
Francisco Eiras
Aleksander Petrov
Bertie Vidgen
Christian Schroeder
Fabio Pizzati
...
Matthew Jackson
Phillip H. S. Torr
Trevor Darrell
Y. Lee
Jakob N. Foerster
55
18
0
14 May 2024
Understanding the performance gap between online and offline alignment algorithms
Yunhao Tang
Daniel Guo
Zeyu Zheng
Daniele Calandriello
Yuan Cao
...
Rémi Munos
Bernardo Avila-Pires
Michal Valko
Yong Cheng
Will Dabney
OffRL
OnRL
41
61
0
14 May 2024
PARDEN, Can You Repeat That? Defending against Jailbreaks via Repetition
Ziyang Zhang
Qizhen Zhang
Jakob N. Foerster
AAML
43
18
0
13 May 2024
RLHF Workflow: From Reward Modeling to Online RLHF
Hanze Dong
Wei Xiong
Bo Pang
Haoxiang Wang
Han Zhao
Yingbo Zhou
Nan Jiang
Doyen Sahoo
Caiming Xiong
Tong Zhang
OffRL
31
99
0
13 May 2024
Quantifying and Optimizing Global Faithfulness in Persona-driven Role-playing
Letian Peng
Jingbo Shang
53
2
0
13 May 2024
OpenLLM-Ro -- Technical Report on Open-source Romanian LLMs
Mihai Masala
Denis C. Ilie-Ablachim
D. Corlatescu
Miruna Zavelca
Marius Leordeanu
Horia Velicu
Marius Popescu
Mihai Dascalu
Traian Rebedea
54
2
0
13 May 2024
Advanced Natural-based interaction for the ITAlian language: LLaMAntino-3-ANITA
Marco Polignano
Pierpaolo Basile
Giovanni Semeraro
51
19
0
11 May 2024
Integrating Emotional and Linguistic Models for Ethical Compliance in Large Language Models
Edward Y. Chang
32
3
0
11 May 2024
A Survey of Large Language Models for Graphs
Xubin Ren
Jiabin Tang
Dawei Yin
Nitesh Chawla
Chao Huang
32
34
0
10 May 2024
Value Augmented Sampling for Language Model Alignment and Personalization
Seungwook Han
Idan Shenfeld
Akash Srivastava
Yoon Kim
Pulkit Agrawal
OffRL
36
23
0
10 May 2024
The Role of Learning Algorithms in Collective Action
Omri Ben-Dov
Jake Fawkes
Samira Samadi
Amartya Sanyal
44
4
0
10 May 2024
Improving Instruction Following in Language Models through Proxy-Based Uncertainty Estimation
JoonHo Lee
Jae Oh Woo
Juree Seok
Parisa Hassanzadeh
Wooseok Jang
...
Hankyu Moon
Wenjun Hu
Yeong-Dae Kwon
Taehee Lee
Seungjai Min
57
2
0
10 May 2024
Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
Zorik Gekhman
G. Yona
Roee Aharoni
Matan Eyal
Amir Feder
Roi Reichart
Jonathan Herzig
57
109
0
09 May 2024
Binary Hypothesis Testing for Softmax Models and Leverage Score Models
Yeqi Gao
Yuzhou Gu
Zhao Song
40
0
0
09 May 2024
Cross-Care: Assessing the Healthcare Implications of Pre-training Data on Language Model Bias
Shan Chen
Jack Gallifant
Mingye Gao
Pedro Moreira
Nikolaj Munch
...
Hugo J. W. L. Aerts
Brian Anthony
Leo Anthony Celi
William G. La Cava
Danielle S. Bitterman
48
9
0
09 May 2024
Truthful Aggregation of LLMs with an Application to Online Advertising
Ermis Soumalias
Michael J. Curry
Sven Seuken
47
11
0
09 May 2024
ADELIE: Aligning Large Language Models on Information Extraction
Yunjia Qi
Hao Peng
Xiaozhi Wang
Bin Xu
Lei Hou
Juanzi Li
51
7
0
08 May 2024
Bridging the Bosphorus: Advancing Turkish Large Language Models through Strategies for Low-Resource Language Adaptation and Benchmarking
Emre Can Acikgoz
Mete Erdogan
Deniz Yuret
44
7
0
07 May 2024
A Causal Explainable Guardrails for Large Language Models
Zhixuan Chu
Yan Wang
Longfei Li
Peng Kuang
Zhan Qin
Kui Ren
LLMSV
57
7
0
07 May 2024
Optimizing Language Model's Reasoning Abilities with Weak Supervision
Yongqi Tong
Sizhe Wang
Dawei Li
Yifan Wang
Simeng Han
Zi Lin
Chengsong Huang
Jiaxin Huang
Jingbo Shang
LRM
ReLM
52
9
0
07 May 2024
MoDiPO: text-to-motion alignment via AI-feedback-driven Direct Preference Optimization
Massimiliano Pappa
Luca Collorone
Giovanni Ficarra
Indro Spinelli
Fabio Galasso
54
1
0
06 May 2024
GREEN: Generative Radiology Report Evaluation and Error Notation
Sophie Ostmeier
Justin Xu
Zhihong Chen
Maya Varma
Louis Blankemeier
...
Arne Edward Michalson
Michael E. Moseley
Curtis P. Langlotz
Akshay S. Chaudhari
Jean-Benoit Delbrouck
MedIm
53
23
0
06 May 2024
CityLLaVA: Efficient Fine-Tuning for VLMs in City Scenario
Zhizhao Duan
Hao Cheng
Duo Xu
Xi Wu
Xiangxie Zhang
Xi Ye
Zhen Xie
34
7
0
06 May 2024
PICLe: Eliciting Diverse Behaviors from Large Language Models with Persona In-Context Learning
Hyeong Kyu Choi
Yixuan Li
69
17
0
03 May 2024
FLAME: Factuality-Aware Alignment for Large Language Models
Sheng-Chieh Lin
Luyu Gao
Barlas Oğuz
Wenhan Xiong
Jimmy Lin
Wen-tau Yih
Xilun Chen
HILM
44
16
0
02 May 2024
D2PO: Discriminator-Guided DPO with Response Evaluation Models
Prasann Singhal
Nathan Lambert
S. Niekum
Tanya Goyal
Greg Durrett
OffRL
EGVM
50
4
0
02 May 2024
NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment
Gerald Shen
Zhilin Wang
Olivier Delalleau
Jiaqi Zeng
Yi Dong
...
Sahil Jain
Ali Taghibakhshi
Markel Sanz Ausin
Ashwath Aithal
Oleksii Kuchaiev
48
13
0
02 May 2024
WildChat: 1M ChatGPT Interaction Logs in the Wild
Wenting Zhao
Xiang Ren
Jack Hessel
Claire Cardie
Yejin Choi
Yuntian Deng
46
185
0
02 May 2024
The Effectiveness of LLMs as Annotators: A Comparative Overview and Empirical Analysis of Direct Representation
Maja Pavlovic
Massimo Poesio
49
18
0
02 May 2024
Self-Play Preference Optimization for Language Model Alignment
Yue Wu
Zhiqing Sun
Huizhuo Yuan
Kaixuan Ji
Yiming Yang
Quanquan Gu
44
116
0
01 May 2024
Addressing Topic Granularity and Hallucination in Large Language Models for Topic Modelling
Yida Mu
Peizhen Bai
Kalina Bontcheva
Xingyi Song
41
6
0
01 May 2024
The Real, the Better: Aligning Large Language Models with Online Human Behaviors
Guanying Jiang
Lingyong Yan
Haibo Shi
Dawei Yin
43
2
0
01 May 2024
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning
Yuxi Xie
Anirudh Goyal
Wenyue Zheng
Min-Yen Kan
Timothy Lillicrap
Kenji Kawaguchi
Michael Shieh
ReLM
LRM
60
91
0
01 May 2024
MetaRM: Shifted Distributions Alignment via Meta-Learning
Shihan Dou
Yan Liu
Enyu Zhou
Changze Lv
Haoxiang Jia
...
Junjie Ye
Rui Zheng
Tao Gui
Qi Zhang
Xuanjing Huang
OOD
76
2
0
01 May 2024
Self-Refine Instruction-Tuning for Aligning Reasoning in Language Models
Leonardo Ranaldi
André Freitas
LRM
ReLM
44
11
0
01 May 2024
RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation
Chanwoo Park
Mingyang Liu
Dingwen Kong
Kaiqing Zhang
Asuman Ozdaglar
52
30
0
30 Apr 2024
Soft Preference Optimization: Aligning Language Models to Expert Distributions
Arsalan Sharifnassab
Sina Ghiassian
Saber Salehkaleybar
Surya Kanoria
Dale Schuurmans
36
2
0
30 Apr 2024
Iterative Reasoning Preference Optimization
Richard Yuanzhe Pang
Weizhe Yuan
Kyunghyun Cho
He He
Sainbayar Sukhbaatar
Jason Weston
LRM
52
116
0
30 Apr 2024
Countering Reward Over-optimization in LLM with Demonstration-Guided Reinforcement Learning
Mathieu Rita
Florian Strub
Rahma Chaabouni
Paul Michel
Emmanuel Dupoux
Olivier Pietquin
42
8
0
30 Apr 2024
More RLHF, More Trust? On The Impact of Human Preference Alignment On Language Model Trustworthiness
Aaron Jiaxun Li
Satyapriya Krishna
Himabindu Lakkaraju
48
3
0
29 Apr 2024
Performance-Aligned LLMs for Generating Fast Code
Daniel Nichols
Pranav Polasam
Harshitha Menon
Aniruddha Marathe
T. Gamblin
A. Bhatele
40
8
0
29 Apr 2024
ConPro: Learning Severity Representation for Medical Images using Contrastive Learning and Preference Optimization
Hong Nguyen
H. Nguyen
Melinda Y. Chang
Hieu H. Pham
Shrikanth Narayanan
Michael Pazzani
37
0
0
29 Apr 2024
HFT: Half Fine-Tuning for Large Language Models
Tingfeng Hui
Zhenyu Zhang
Shuohuan Wang
Weiran Xu
Yu Sun
Hua Wu
CLL
52
5
0
29 Apr 2024
DPO Meets PPO: Reinforced Token Optimization for RLHF
Han Zhong
Zikang Shan
Guhao Feng
Wei Xiong
Xinle Cheng
Li Zhao
Di He
Jiang Bian
Liwei Wang
68
57
0
29 Apr 2024
SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning
Jinghan Jia
Yihua Zhang
Yimeng Zhang
Jiancheng Liu
Bharat Runwal
James Diffenderfer
B. Kailkhura
Sijia Liu
MU
52
36
0
28 Apr 2024
From Persona to Personalization: A Survey on Role-Playing Language Agents
Jiangjie Chen
Xintao Wang
Rui Xu
Siyu Yuan
Yikai Zhang
...
Caiyu Hu
Siye Wu
Scott Ren
Ziquan Fu
Yanghua Xiao
67
79
0
28 Apr 2024
Evaluation of Few-Shot Learning for Classification Tasks in the Polish Language
Tsimur Hadeliya
D. Kajtoch
53
0
0
27 Apr 2024
Previous
1
2
3
...
39
40
41
...
51
52
53
Next