ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.24709
  4. Cited By
On Symmetric Losses for Robust Policy Optimization with Noisy Preferences

On Symmetric Losses for Robust Policy Optimization with Noisy Preferences

30 May 2025
Soichiro Nishimori
Yu Zhang
Thanawat Lodkaew
Masashi Sugiyama
    NoLa
ArXivPDFHTML

Papers citing "On Symmetric Losses for Robust Policy Optimization with Noisy Preferences"

7 / 7 papers shown
Title
RePO: ReLU-based Preference Optimization
Junkang Wu
Kexin Huang
Xue Wang
Jinyang Gao
Bolin Ding
Jiancan Wu
Xiangnan He
Xiang Wang
83
1
0
10 Mar 2025
KTO: Model Alignment as Prospect Theoretic Optimization
KTO: Model Alignment as Prospect Theoretic Optimization
Kawin Ethayarajh
Winnie Xu
Niklas Muennighoff
Dan Jurafsky
Douwe Kiela
199
510
0
02 Feb 2024
A General Theoretical Paradigm to Understand Learning from Human
  Preferences
A General Theoretical Paradigm to Understand Learning from Human Preferences
M. G. Azar
Mark Rowland
Bilal Piot
Daniel Guo
Daniele Calandriello
Michal Valko
Rémi Munos
112
597
0
18 Oct 2023
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
500
41,106
0
28 May 2020
Mitigating Overfitting in Supervised Classification from Two Unlabeled
  Datasets: A Consistent Risk Correction Approach
Mitigating Overfitting in Supervised Classification from Two Unlabeled Datasets: A Consistent Risk Correction Approach
Nan Lu
Tianyi Zhang
Gang Niu
Masashi Sugiyama
30
55
0
20 Oct 2019
On Symmetric Losses for Learning from Corrupted Labels
On Symmetric Losses for Learning from Corrupted Labels
Nontawat Charoenphakdee
Jongyeong Lee
Masashi Sugiyama
NoLa
39
105
0
27 Jan 2019
Proximal Policy Optimization Algorithms
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
236
18,685
0
20 Jul 2017
1