Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.24709
Cited By
On Symmetric Losses for Robust Policy Optimization with Noisy Preferences
30 May 2025
Soichiro Nishimori
Yu Zhang
Thanawat Lodkaew
Masashi Sugiyama
NoLa
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On Symmetric Losses for Robust Policy Optimization with Noisy Preferences"
27 / 27 papers shown
Title
RePO: ReLU-based Preference Optimization
Junkang Wu
Kexin Huang
Xue Wang
Jinyang Gao
Bolin Ding
Jiancan Wu
Xiangnan He
Xiang Wang
83
1
0
10 Mar 2025
Robust Preference Optimization through Reward Model Distillation
Adam Fisch
Jacob Eisenstein
Vicky Zayats
Alekh Agarwal
Ahmad Beirami
Chirag Nagpal
Peter Shaw
Jonathan Berant
110
29
0
29 May 2024
SimPO: Simple Preference Optimization with a Reference-Free Reward
Yu Meng
Mengzhou Xia
Danqi Chen
88
425
0
23 May 2024
Impact of Preference Noise on the Alignment Performance of Generative Language Models
Yang Gao
Dana Alon
Donald Metzler
68
20
0
15 Apr 2024
Provably Robust DPO: Aligning Language Models with Noisy Feedback
Sayak Ray Chowdhury
Anush Kini
Nagarajan Natarajan
61
67
0
01 Mar 2024
Generalized Preference Optimization: A Unified Approach to Offline Alignment
Yunhao Tang
Z. Guo
Zeyu Zheng
Daniele Calandriello
Rémi Munos
Mark Rowland
Pierre Harvey Richemond
Michal Valko
Bernardo Avila-Pires
Bilal Piot
39
100
0
08 Feb 2024
KTO: Model Alignment as Prospect Theoretic Optimization
Kawin Ethayarajh
Winnie Xu
Niklas Muennighoff
Dan Jurafsky
Douwe Kiela
199
510
0
02 Feb 2024
A General Theoretical Paradigm to Understand Learning from Human Preferences
M. G. Azar
Mark Rowland
Bilal Piot
Daniel Guo
Daniele Calandriello
Michal Valko
Rémi Munos
112
597
0
18 Oct 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
282
3,712
0
29 May 2023
SLiC-HF: Sequence Likelihood Calibration with Human Feedback
Yao-Min Zhao
Rishabh Joshi
Tianqi Liu
Misha Khalman
Mohammad Saleh
Peter J. Liu
52
284
0
17 May 2023
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
647
13,788
0
15 Mar 2023
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Yuntao Bai
Andy Jones
Kamal Ndousse
Amanda Askell
Anna Chen
...
Jack Clark
Sam McCandlish
C. Olah
Benjamin Mann
Jared Kaplan
212
2,457
0
12 Apr 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
694
12,525
0
04 Mar 2022
Learning with Instance-Dependent Label Noise: A Sample Sieve Approach
Hao Cheng
Zhaowei Zhu
Xingyu Li
Yifei Gong
Xing Sun
Yang Liu
NoLa
52
205
0
05 Oct 2020
Learning to summarize from human feedback
Nisan Stiennon
Long Ouyang
Jeff Wu
Daniel M. Ziegler
Ryan J. Lowe
Chelsea Voss
Alec Radford
Dario Amodei
Paul Christiano
ALM
180
2,071
0
02 Sep 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
500
41,106
0
28 May 2020
Mitigating Overfitting in Supervised Classification from Two Unlabeled Datasets: A Consistent Risk Correction Approach
Nan Lu
Tianyi Zhang
Gang Niu
Masashi Sugiyama
30
55
0
20 Oct 2019
Symmetric Cross Entropy for Robust Learning with Noisy Labels
Yisen Wang
Xingjun Ma
Zaiyi Chen
Yuan Luo
Jinfeng Yi
James Bailey
NoLa
63
888
0
16 Aug 2019
On Symmetric Losses for Learning from Corrupted Labels
Nontawat Charoenphakdee
Jongyeong Lee
Masashi Sugiyama
NoLa
42
105
0
27 Jan 2019
Learning Models with Uniform Performance via Distributionally Robust Optimization
John C. Duchi
Hongseok Namkoong
OOD
46
413
0
20 Oct 2018
On the Minimal Supervision for Training Any Binary Classifier from Only Unlabeled Data
Nan Lu
Gang Niu
A. Menon
Masashi Sugiyama
MQ
56
87
0
31 Aug 2018
Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels
Zhilu Zhang
M. Sabuncu
NoLa
60
2,580
0
20 May 2018
Robust Loss Functions under Label Noise for Deep Neural Networks
Aritra Ghosh
Himanshu Kumar
P. Sastry
NoLa
OOD
49
952
0
27 Dec 2017
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
236
18,685
0
20 Jul 2017
Variance-based regularization with convex objectives
John C. Duchi
Hongseok Namkoong
57
348
0
08 Oct 2016
Learning with Symmetric Label Noise: The Importance of Being Unhinged
Brendan van Rooyen
A. Menon
Robert C. Williamson
NoLa
92
309
0
28 May 2015
Composite Binary Losses
Mark D. Reid
Robert C. Williamson
112
223
0
17 Dec 2009
1