ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.01857
  4. Cited By
Reward Model Learning vs. Direct Policy Optimization: A Comparative
  Analysis of Learning from Human Preferences

Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences

4 March 2024
Andi Nika
Debmalya Mandal
Parameswaran Kamalaruban
Georgios Tzannetos
Goran Radanović
Adish Singla
ArXivPDFHTML

Papers citing "Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences"

1 / 1 papers shown
Title
Multi-Player Approaches for Dueling Bandits
Multi-Player Approaches for Dueling Bandits
Or Raveh
Junya Honda
Masashi Sugiyama
81
1
0
25 May 2024
1