Accelerating Nash Learning from Human Feedback via Mirror Prox

26 May 2025

Papers citing "Accelerating Nash Learning from Human Feedback via Mirror Prox"

3 / 3 papers shown

Title
A General Theoretical Paradigm to Understand Learning from Human Preferences M. G. Azar Mark Rowland Bilal Piot Daniel Guo Daniele Calandriello Michal Valko Rémi Munos 66 580 0 18 Oct 2023
ESCHER: Eschewing Importance Sampling in Games by Computing a History Value Function to Estimate Regret Stephen Marcus McAleer Gabriele Farina Marc Lanctot Tuomas Sandholm 48 25 0 08 Jun 2022
Preference-based Online Learning with Dueling Bandits: A Survey Viktor Bengs R. Busa-Fekete Adil El Mesaoudi-Paul Eyke Hüllermeier 28 112 0 30 Jul 2018