Reward Model Learning vs. Direct Policy Optimization: A Comparative
  Analysis of Learning from Human Preferences

Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences

Papers citing "Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences"