Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2502.17507
Cited By
C-3DPO: Constrained Controlled Classification for Direct Preference Optimization
22 February 2025
Kavosh Asadi
Julien Han
Xingzi Xu
Dominique Perrault-Joncas
Shoham Sabach
Karim Bouyarmane
Mohammad Ghavamzadeh
Re-assign community
ArXiv
PDF
HTML
Papers citing
"C-3DPO: Constrained Controlled Classification for Direct Preference Optimization"
7 / 7 papers shown
Title
f
f
f
-PO: Generalizing Preference Optimization with
f
f
f
-divergence Minimization
Jiaqi Han
Mingjian Jiang
Yuxuan Song
J. Leskovec
Stefano Ermon
68
5
0
29 Oct 2024
Nemotron-4 340B Technical Report
Nvidia
:
Bo Adler
Niket Agarwal
Ashwath Aithal
...
Jimmy Zhang
Jing Zhang
Vivienne Zhang
Yian Zhang
Chen Zhu
76
60
0
17 Jun 2024
Soft Preference Optimization: Aligning Language Models to Expert Distributions
Arsalan Sharifnassab
Sina Ghiassian
Saber Salehkaleybar
Surya Kanoria
Dale Schuurmans
38
2
0
30 Apr 2024
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study
Shusheng Xu
Wei Fu
Jiaxuan Gao
Wenjie Ye
Weiling Liu
Zhiyu Mei
Guangju Wang
Chao Yu
Yi Wu
78
145
0
16 Apr 2024
A General Theoretical Paradigm to Understand Learning from Human Preferences
M. G. Azar
Mark Rowland
Bilal Piot
Daniel Guo
Daniele Calandriello
Michal Valko
Rémi Munos
89
580
0
18 Oct 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
176
4,085
0
09 Jun 2023
When Does Label Smoothing Help?
Rafael Müller
Simon Kornblith
Geoffrey E. Hinton
UQCV
106
1,931
0
06 Jun 2019
1