ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.09183
30
0
v1v2 (latest)

Multi-Task Reward Learning from Human Ratings

10 June 2025
Mingkang Wu
Devin White
Evelyn Rose
Vernon J. Lawhern
Nicholas R. Waytowich
Yongcan Cao
ArXiv (abs)PDFHTML
Main:6 Pages
3 Figures
Bibliography:1 Pages
1 Tables
Appendix:1 Pages
Abstract

Reinforcement learning from human feeback (RLHF) has become a key factor in aligning model behavior with users' goals. However, while humans integrate multiple strategies when making decisions, current RLHF approaches often simplify this process by modeling human reasoning through isolated tasks such as classification or regression. In this paper, we propose a novel reinforcement learning (RL) method that mimics human decision-making by jointly considering multiple tasks. Specifically, we leverage human ratings in reward-free environments to infer a reward function, introducing learnable weights that balance the contributions of both classification and regression models. This design captures the inherent uncertainty in human decision-making and allows the model to adaptively emphasize different strategies. We conduct several experiments using synthetic human ratings to validate the effectiveness of the proposed approach. Results show that our method consistently outperforms existing rating-based RL methods, and in some cases, even surpasses traditional RL approaches.

View on arXiv
@article{wu2025_2506.09183,
  title={ Multi-Task Reward Learning from Human Ratings },
  author={ Mingkang Wu and Devin White and Evelyn Rose and Vernon Lawhern and Nicholas R Waytowich and Yongcan Cao },
  journal={arXiv preprint arXiv:2506.09183},
  year={ 2025 }
}
Comments on this paper