Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.19731
Cited By
Accelerating Nash Learning from Human Feedback via Mirror Prox
26 May 2025
D. Tiapkin
Daniele Calandriello
Denis Belomestny
Eric Moulines
Alexey Naumov
Kashif Rasul
Michal Valko
Pierre Ménard
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Accelerating Nash Learning from Human Feedback via Mirror Prox"
3 / 3 papers shown
Title
A General Theoretical Paradigm to Understand Learning from Human Preferences
M. G. Azar
Mark Rowland
Bilal Piot
Daniel Guo
Daniele Calandriello
Michal Valko
Rémi Munos
66
580
0
18 Oct 2023
ESCHER: Eschewing Importance Sampling in Games by Computing a History Value Function to Estimate Regret
Stephen Marcus McAleer
Gabriele Farina
Marc Lanctot
Tuomas Sandholm
48
25
0
08 Jun 2022
Preference-based Online Learning with Dueling Bandits: A Survey
Viktor Bengs
R. Busa-Fekete
Adil El Mesaoudi-Paul
Eyke Hüllermeier
28
112
0
30 Jul 2018
1