Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2206.12301
Cited By
On the Limitations of Elo: Real-World Games, are Transitive, not Additive
21 June 2022
Quentin Bertrand
Wojciech M. Czarnecki
Gauthier Gidel
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On the Limitations of Elo: Real-World Games, are Transitive, not Additive"
18 / 18 papers shown
Title
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
Kai Ye
Hongyi Zhou
Jin Zhu
Francesco Quinzan
C. Shi
32
1
0
03 Apr 2025
Strength Estimation and Human-Like Strength Adjustment in Games
Chun Jung Chen
Chung-Chin Shih
Ti-Rong Wu
OffRL
176
1
0
24 Feb 2025
Is Elo Rating Reliable? A Study Under Model Misspecification
Shange Tang
Yuanhao Wang
Chi Jin
49
1
0
16 Feb 2025
Evaluation of Large Language Models via Coupled Token Generation
N. C. Benz
Stratis Tsirtsis
Eleni Straitouri
Ivi Chatzi
Ander Artola Velasco
Suhas Thejaswi
Manuel Gomez Rodriguez
51
0
0
03 Feb 2025
Soft Condorcet Optimization for Ranking of General Agents
Marc Lanctot
Kate Larson
Michael Kaisers
Quentin Berthet
I. Gemp
Manfred Diaz
Roberto-Rafael Maura-Rivero
Yoram Bachrach
Anna Koop
Doina Precup
47
0
0
31 Oct 2024
Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment
Yifan Zhang
Ge Zhang
Yue Wu
Kangping Xu
Quanquan Gu
48
3
0
03 Oct 2024
A Survey on Self-play Methods in Reinforcement Learning
Chao Yu
Zelai Xu
Chengdong Ma
Chao Yu
Weijuan Tu
...
Deheng Ye
Wenbo Ding
Yaodong Yang
Yu Wang
Yu Wang
SyDa
SSL
OnRL
58
8
0
02 Aug 2024
A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO, DPO and More
Zhichao Wang
Bin Bi
Shiva K. Pentyala
Kiran Ramnath
Sougata Chaudhuri
...
Z. Zhu
Xiang-Bo Mao
S. Asur
Na
Na Cheng
OffRL
42
40
0
23 Jul 2024
Data-Centric Human Preference Optimization with Rationales
H. Just
Ming Jin
Anit Kumar Sahu
Huy Phan
Ruoxi Jia
52
3
0
19 Jul 2024
ContraSolver: Self-Alignment of Language Models by Resolving Internal Preference Contradictions
Xu Zhang
Xunjian Yin
Xiaojun Wan
48
3
0
13 Jun 2024
Online Self-Preferring Language Models
Yuanzhao Zhai
Zhuo Zhang
Kele Xu
Hanyang Peng
Yue Yu
Dawei Feng
Cheng Yang
Bo Ding
Huaimin Wang
56
0
0
23 May 2024
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Corby Rosset
Ching-An Cheng
Arindam Mitra
Michael Santacroce
Ahmed Hassan Awadallah
Tengyang Xie
152
114
0
04 Apr 2024
Prediction-Powered Ranking of Large Language Models
Ivi Chatzi
Eleni Straitouri
Suhas Thejaswi
Manuel Gomez Rodriguez
ALM
29
5
0
27 Feb 2024
Nash Learning from Human Feedback
Rémi Munos
Michal Valko
Daniele Calandriello
M. G. Azar
Mark Rowland
...
Nikola Momchev
Olivier Bachem
D. Mankowitz
Doina Precup
Bilal Piot
42
125
0
01 Dec 2023
Elo Uncovered: Robustness and Best Practices in Language Model Evaluation
M. Boubdir
Edward Kim
Beyza Ermis
Sara Hooker
Marzieh Fadaee
ELM
27
35
0
29 Nov 2023
A General Theoretical Paradigm to Understand Learning from Human Preferences
M. G. Azar
Mark Rowland
Bilal Piot
Daniel Guo
Daniele Calandriello
Michal Valko
Rémi Munos
47
541
0
18 Oct 2023
Ordinal Potential-based Player Rating
N. Vadori
Rahul Savani
12
3
0
08 Jun 2023
Principal Trade-off Analysis
Alexander Strang
David Sewell
Alexander Kim
K. Alcedo
D. Rosenbluth
6
1
0
09 Jun 2022
1