Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.01857
Cited By
Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences
4 March 2024
Andi Nika
Debmalya Mandal
Parameswaran Kamalaruban
Georgios Tzannetos
Goran Radanović
Adish Singla
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences"
1 / 1 papers shown
Title
Multi-Player Approaches for Dueling Bandits
Or Raveh
Junya Honda
Masashi Sugiyama
81
1
0
25 May 2024
1